This Notebook helps in improving quality of created news data set by adding SBERT based similarity score for each of the similar/related headline.



*   SBERT (Sentence Embeddings using Siamese BERT-Networks) is used for transforming reference headline and similar headline in to their vector representation and then cosine similarity is used for calculating similarity score for similar/related headline against the reference headline.

*   This similarity score will be used for filtering out less relevant similar/related headlines and will help in getting quality news dataset.



In [None]:
'''
mount google drive folder
'''
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
#SBERT for sentance level similarity analysis 
#https://github.com/UKPLab/sentence-transformers
#https://www.sbert.net/docs/usage/semantic_textual_similarity.html

In [None]:
pip install -U sentence-transformers

Collecting sentence-transformers
[?25l  Downloading https://files.pythonhosted.org/packages/35/aa/f672ce489063c4ee7a566ebac1b723c53ac0cea19d9e36599cc241d8ed56/sentence-transformers-1.0.4.tar.gz (74kB)
[K     |████████████████████████████████| 81kB 5.7MB/s 
[?25hCollecting transformers<5.0.0,>=3.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 11.4MB/s 
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 44.7MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m

In [None]:
'''
import required packages
'''
import glob
import pandas as pd
import numpy as np

from sentence_transformers import SentenceTransformer, util

In [None]:
'''
list the saved news data pickle file. (news-url,ref-headline,news-content,author,publish-date,similar-headline-list,similar-headline-url-list)
'''
base_data_loc='/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/'
data_file_list=glob.glob(base_data_loc+'*')
data_file_list

['/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_1_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_2_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_3_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_4_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_5_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_6_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_7_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_8_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/data_9_50.df',
 '/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_a

In [None]:
individual_data_file_df_list=[pd.read_pickle(data_file) for data_file in data_file_list]
article_details_df = pd.concat(individual_data_file_df_list, axis=0, ignore_index=True)
print('article_details_df shape : ',article_details_df.shape)

article_details_df shape :  (3000, 8)


In [None]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2000)
pd.set_option('display.float_format', '{:20,.2f}'.format)
pd.set_option('display.max_colwidth', None)

In [None]:
article_details_df.head(5)

Unnamed: 0,article_url,headline,content,author,published_date,read_more_source,similar_headline,similar_headline_url
0,https://inshorts.com/en/news/you-shall-always-have-my-support-arjun-kapoor-on-janhvis-bday-1615047490059,You shall always have my support: Arjun Kapoor on Janhvi's b'day,"Taking to Instagram on Saturday, Arjun Kapoor posted a picture of himself with Janhvi Kapoor to wish the actress on her 24th birthday. In the picture, Arjun can be seen walking ahead while holding his sister's hand. ""Happy birthday Janhvi...I can't promise much except like this picture you shall always have my support & hand wherever you go,"" Arjun wrote.",,2021-03-06T16:18:10.000Z,PINKVILLA,"[On Janhvi Kapoor’s birthday, Arjun promises to be a constant support: You shall have my hand wherever you go, ""You will always have my support,"" Arjun Kapoor writes a heartfelt birthday note for Janhvi Kapoor., 'You shall always have my support': Arjun Kapoor pens heart-warming birthday note for Janhvi, Janhvi Kapoor rings in 24th birthday with her crew, sister Anshula plans huge surprise]","[https://www.pinkvilla.com/entertainment/news/janhvi-kapoor-s-birthday-arjun-promises-be-constant-support-you-shall-have-my-hand-wherever-you-go-633968, https://indianewsrepublic.com/you-will-always-have-my-support-arjun-kapoor-writes-a-heartfelt-birthday-note-for-janhvi-kapoor/222825/, https://english.lokmat.com/entertainment/you-shall-always-have-my-support-arjun-kapoor-pens-heart-warming-birthday-note-for-janhvi/, https://www.republicworld.com/entertainment-news/bollywood-news/janhvi-kapoor-rings-in-24th-birthday-with-her-crew-sister-anshula-plans-huge-surprise.html]"
1,https://inshorts.com/en/news/gavaskar-felicitated-on-50th-anniversary-of-test-debut-1615046996400,Gavaskar felicitated on 50th anniversary of Test debut,"Ex-India captain Sunil Gavaskar was felicitated by the BCCI on the 50th anniversary of his Test debut on Saturday. The 71-year-old received a commemorative Test cap from BCCI Secretary Jay Shah, while several ex-cricketers took to Twitter to congratulate Gavaskar. ""The game is as strong as it is today because they made a start then against all odds,"" Ganguly tweeted.",,2021-03-06T16:09:56.000Z,read more source not available,"[Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated by BCCI on 50th anniversary of his Test debut, Former India captain Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, Sunil Gavaskar Felicitated By BCCI On 50th Anniversary Of Test Debut. Watch, Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, BCCI Felicitates Sunil Gavaskar On His 50th Anniversary Of Test Debut]","[https://www.thehindu.com/sport/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut/article34005053.ece, https://www.deccanherald.com/sports/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut-958854.html, https://timesofindia.indiatimes.com/sports/cricket/england-in-india/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362569.cms, https://www.sportskeeda.com/cricket/sunil-gavaskar-felicitated-by-bcci-50th-anniversary-test-debut, https://m.economictimes.com/news/sports/former-india-captain-sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362570.cms, https://sports.ndtv.com/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-watch-2384926, https://www.indiatvnews.com/sports/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-689160, https://cricketaddictor.com/cricket-news/bcci-felicitates-sunil-gavaskar-on-his-50th-anniversary-of-test-debut/]"
2,https://inshorts.com/en/news/pv-sindhu-to-face-carolina-marin-in-her-1st-final-since-2019-world-cships-win-1615046535615,PV Sindhu to face Carolina Marin in her 1st final since 2019 World C'ships win,"Shuttler PV Sindhu on Saturday defeated Mia Blichfeldt in straight games to enter the final of the Swiss Open, where she'll face Carolina Marin. Notably, this is Sindhu's first final appearance since winning the BWF World Championships in August 2019. In the men's singles, Kidambi Srikanth suffered defeat in the semifinal against world number two Viktor Axelsen of Denmark.",,2021-03-06T16:02:15.000Z,read more source not available,"[Carolina Marin overpowers PV Sindhu in Swiss Open final, Sindhu becomes first Indian shuttler to win World C’ships gold, bear Okuhara in final, 21 wins, 2 silvers, 2 bronzes, 1 gold: Sindhu completes the Worlds set, PV Sindhu gets past feisty Blichfeldt to make Swiss Open final, Badminton World C’ships: Carolina Marin Beats PV Sindhu in Final, PV Sindhu vs Carolina Marin final, BWF World C'ships 2018, highlights: Marin outwits Sindhu, Carolina Marin]","[https://www.olympicchannel.com/en/stories/news/detail/badminton-carolina-marin-eases-past-pv-sindhu-swiss-open-final/, https://www.thehindu.com/sport/other-sports/pv-sindhu-becomes-first-indian-shuttler-to-win-world-championships-gold/article29253013.ece, https://www.espn.com/badminton/story/_/id/27459172/sindhu-completes-worlds-set, https://www.espn.com/badminton/story/_/id/31015323/pv-sindhu-gets-feisty-blichfeldt-make-swiss-open-final, https://www.thequint.com/sports/badminton/live-updates-sindhu-vs-marin-final-badminton-world-championships, https://www.timesnownews.com/sports/badminton/article/pv-sindhu-vs-carolina-marin-final-bwf-world-championships-2018-live-badminton-score-updates-sindhu-seeks-elusive-gold-nanjing-china/264675, https://www.deccanchronicle.com/content/tags/carolina-marin]"
3,https://inshorts.com/en/news/linkedin-to-stop-collecting-tracking-ad-targeting-data-on-ios-1615052233555,"LinkedIn to stop collecting tracking, ad targeting data on iOS","LinkedIn has announced that it will stop collecting IDFA data, which is used by firms for tracking and ad targeting, on iOS for now ahead of Apple's upcoming changes to IDFA. LinkedIn said the change affects the LinkedIn Audience Network (LAN), Conversion Tracking and Matched Audiences. It added that it expects a limited impact on users' campaign performance.",,2021-03-06T17:37:13.000Z,Hindustan Times,"[LinkedIn to stop collecting tracking data after Apple’s app tracking transparency changes, LinkedIn to Stop Collecting IDFA Data from iOS Devices in respect to Apple’s App Tracking Transparency Feature / Digital Information World, LinkedIn stops collecting tracking data ahead of iOS 14 changes, LinkedIn to stop collecting tracking data before Apple shames it, Microsoft's LinkedIn will stop the collection of IDFA data in iOS app, News Break: Local News & Breaking News, LinkedIn Won't Use IDFA After Apple's App Tracking Transparency Changes, LinkedIn stops collecting tracking data ahead of iOS 14 changes, LinkedIn will stop collecting IDFA data on iOS due to Apple’s App Tracking Transparency feature]","[https://www.thehindu.com/sci-tech/technology/linkedin-to-stop-collecting-tracking-data-after-apples-app-tracking-transparency-changes/article33996177.ece, https://www.digitalinformationworld.com/2021/03/linkedin-to-stop-collecting-idfa-data.html, https://www.engadget.com/linkedin-stops-idfa-collection-before-ios-14-205531990.html, https://www.imore.com/linkedin-stop-collecting-tracking-data-apple-shames-it, https://mspoweruser.com/microsoft-linkedin-idfa-data-in-ios-app/, https://www.newsbreak.com/news/2176214961600/microsofts-linkedin-will-stop-the-collection-of-idfa-data-in-ios-app, https://www.macrumors.com/2021/03/04/linkedin-app-tracking-transparency-no-idfa/, https://finance.yahoo.com/news/linkedin-stops-idfa-collection-before-ios-14-205531990.html, https://9to5mac.com/2021/03/04/linkedin-idfa-data-ios/]"
4,https://inshorts.com/en/news/samsung-mastercard-to-make-fingerprintauthenticated-payment-cards-1615084208223,"Samsung, Mastercard to make fingerprint-authenticated payment cards",Samsung Electronics and Mastercard have signed an MoU to develop biometric cards that feature a built-in fingerprint scanner to authorise transactions securely at in-store payment terminals. These biometric cards will be powered by a new security chipset developed by Samsung's System LSI Business. Users will be able to use them at any Mastercard chip terminal or POS terminal.,,2021-03-07,Hindustan Times,"[Samsung and Mastercard are collaborating on a fingerprint payment card, Samsung Electronics, Mastercard and Samsung Card Sign MoU for Fingerprint Biometric Payment Card, Samsung and Mastercard teaming up on a finger-scanning card, Samsung and Mastercard Are Developing a Biometric Card, Samsung, Mastercard to launch payment cards with fingerprint sensors, Samsung and Mastercard to pilot biometric payments card in South Korea, Samsung Electronics, Mastercard and Samsung Card develop fingerprint biometric payment card]","[https://www.engadget.com/samsung-and-mastercard-are-making-a-fingerprint-payment-card-114549845.html, https://www.businesswire.com/news/home/20210304005394/en/Samsung-Electronics-Mastercard-and-Samsung-Card-Sign-MoU-for-Fingerprint-Biometric-Payment-Card, https://www.pocket-lint.com/gadgets/news/samsung/155999-samsung-mastercard-fingerprint-scanning-credit-card, https://www.pcmag.com/news/samsung-and-mastercard-are-developing-a-biometric-card, https://tech.hindustantimes.com/tech/news/samsung-mastercard-to-launch-payment-cards-with-fingerprint-sensors-71615041010868.html, https://www.zdnet.com/article/samsung-and-mastercard-to-pilot-biometric-payments-card-in-south-korea/, https://www.helpnetsecurity.com/2021/03/05/samsung-electronics-mastercard-samsung-card/]"


In [None]:
'''
load the pre-trained SBERT model. this will be used for finding similarity between two sentences. 
'''
model = SentenceTransformer('paraphrase-distilroberta-base-v1')

HBox(children=(FloatProgress(value=0.0, max=305584576.0), HTML(value='')))




In [None]:
'''
function to calculate similarity score for each of the similar headline against reference headline
'''
def get_similarity_scores(reference_headline,similar_reference_headlines):
  cosine_scores=[]
  #Compute embedding for both lists
  ref_headline_embed = model.encode(reference_headline, convert_to_tensor=True)
  for headline in similar_reference_headlines:
    headline_embed = model.encode(headline, convert_to_tensor=True)
    #print('embedding shape: ',headline_embed.shape)
    #Compute cosine-similarits
    cosine_score = util.pytorch_cos_sim(ref_headline_embed, headline_embed).item()
    cosine_scores.append(round(cosine_score, 2))
  return cosine_scores

In [None]:
'''
test function 'get_similarity_scores'
'''
news_summary='A 58-year-old suspected COVID-19 patient, who was put on ventilator, was wrongly declared dead twice by authorities earlier this week at a hospital in Madhya Pradesh. The second time the patient\'s family even prepared for his funeral rites, only for the hospital officials to admit the mix-up shortly after. Officials at the hospital attributed the incident to \"confusion\".'
reference_headline='Suspected COVID-19 patient wrongly declared dead twice by MP hospital(Inshorts)'
similar_reference_headlines=['Madhya Pradesh Hospital Wrongly Declares Covid Patient Dead - Twice',
'In major goof-up, Madhya Pradesh man on ventilator declared dead twice',
'MP COVID mess: Patient declared dead twice by hospital; mismatch at official death figures']
print(get_similarity_scores(reference_headline,similar_reference_headlines))

df_row=article_details_df.iloc[1]
reference_headline=df_row['headline']
similar_reference_headlines=df_row['similar_headline']
print('reference_headline: ',reference_headline)
print('similar_reference_headlines: ',similar_reference_headlines)
print('similarity scores: ',get_similarity_scores(reference_headline,similar_reference_headlines))

[0.59, 0.44, 0.79]
reference_headline:  Gavaskar felicitated on 50th anniversary of Test debut
similar_reference_headlines:  ['Sunil Gavaskar felicitated on 50th anniversary of Test debut', 'Sunil Gavaskar felicitated on 50th anniversary of Test debut', 'Sunil Gavaskar felicitated on 50th anniversary of Test debut', 'Sunil Gavaskar felicitated by BCCI on 50th anniversary of his Test debut', 'Former India captain Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut', 'Sunil Gavaskar Felicitated By BCCI On 50th Anniversary Of Test Debut. Watch', 'Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut', 'BCCI Felicitates Sunil Gavaskar On His 50th Anniversary Of Test Debut']
similarity scores:  [0.95, 0.95, 0.95, 0.88, 0.86, 0.76, 0.88, 0.81]


In [None]:
'''
calculate similarity score for similar headline against reference headline for evry news data record
'''
similarity_scores=[]
for index, row in article_details_df.iterrows():
  reference_headline=row['headline']
  similar_reference_headlines=row['similar_headline']
  scores=get_similarity_scores(reference_headline,similar_reference_headlines)
  similarity_scores.append(scores)

In [None]:
'''
add calculated similarity score as new column in the existing news dataset
'''
article_details_df['similarity_scores']=similarity_scores

In [None]:
'''
show a sample record
'''
article_details_df.head(3)

Unnamed: 0,article_url,headline,content,author,published_date,read_more_source,similar_headline,similar_headline_url,similarity_scores
0,https://inshorts.com/en/news/you-shall-always-have-my-support-arjun-kapoor-on-janhvis-bday-1615047490059,You shall always have my support: Arjun Kapoor on Janhvi's b'day,"Taking to Instagram on Saturday, Arjun Kapoor posted a picture of himself with Janhvi Kapoor to wish the actress on her 24th birthday. In the picture, Arjun can be seen walking ahead while holding his sister's hand. ""Happy birthday Janhvi...I can't promise much except like this picture you shall always have my support & hand wherever you go,"" Arjun wrote.",,2021-03-06T16:18:10.000Z,PINKVILLA,"[On Janhvi Kapoor’s birthday, Arjun promises to be a constant support: You shall have my hand wherever you go, ""You will always have my support,"" Arjun Kapoor writes a heartfelt birthday note for Janhvi Kapoor., 'You shall always have my support': Arjun Kapoor pens heart-warming birthday note for Janhvi, Janhvi Kapoor rings in 24th birthday with her crew, sister Anshula plans huge surprise]","[https://www.pinkvilla.com/entertainment/news/janhvi-kapoor-s-birthday-arjun-promises-be-constant-support-you-shall-have-my-hand-wherever-you-go-633968, https://indianewsrepublic.com/you-will-always-have-my-support-arjun-kapoor-writes-a-heartfelt-birthday-note-for-janhvi-kapoor/222825/, https://english.lokmat.com/entertainment/you-shall-always-have-my-support-arjun-kapoor-pens-heart-warming-birthday-note-for-janhvi/, https://www.republicworld.com/entertainment-news/bollywood-news/janhvi-kapoor-rings-in-24th-birthday-with-her-crew-sister-anshula-plans-huge-surprise.html]","[0.7, 0.8, 0.8, 0.41]"
1,https://inshorts.com/en/news/gavaskar-felicitated-on-50th-anniversary-of-test-debut-1615046996400,Gavaskar felicitated on 50th anniversary of Test debut,"Ex-India captain Sunil Gavaskar was felicitated by the BCCI on the 50th anniversary of his Test debut on Saturday. The 71-year-old received a commemorative Test cap from BCCI Secretary Jay Shah, while several ex-cricketers took to Twitter to congratulate Gavaskar. ""The game is as strong as it is today because they made a start then against all odds,"" Ganguly tweeted.",,2021-03-06T16:09:56.000Z,read more source not available,"[Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated by BCCI on 50th anniversary of his Test debut, Former India captain Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, Sunil Gavaskar Felicitated By BCCI On 50th Anniversary Of Test Debut. Watch, Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, BCCI Felicitates Sunil Gavaskar On His 50th Anniversary Of Test Debut]","[https://www.thehindu.com/sport/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut/article34005053.ece, https://www.deccanherald.com/sports/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut-958854.html, https://timesofindia.indiatimes.com/sports/cricket/england-in-india/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362569.cms, https://www.sportskeeda.com/cricket/sunil-gavaskar-felicitated-by-bcci-50th-anniversary-test-debut, https://m.economictimes.com/news/sports/former-india-captain-sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362570.cms, https://sports.ndtv.com/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-watch-2384926, https://www.indiatvnews.com/sports/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-689160, https://cricketaddictor.com/cricket-news/bcci-felicitates-sunil-gavaskar-on-his-50th-anniversary-of-test-debut/]","[0.95, 0.95, 0.95, 0.88, 0.86, 0.76, 0.88, 0.81]"
2,https://inshorts.com/en/news/pv-sindhu-to-face-carolina-marin-in-her-1st-final-since-2019-world-cships-win-1615046535615,PV Sindhu to face Carolina Marin in her 1st final since 2019 World C'ships win,"Shuttler PV Sindhu on Saturday defeated Mia Blichfeldt in straight games to enter the final of the Swiss Open, where she'll face Carolina Marin. Notably, this is Sindhu's first final appearance since winning the BWF World Championships in August 2019. In the men's singles, Kidambi Srikanth suffered defeat in the semifinal against world number two Viktor Axelsen of Denmark.",,2021-03-06T16:02:15.000Z,read more source not available,"[Carolina Marin overpowers PV Sindhu in Swiss Open final, Sindhu becomes first Indian shuttler to win World C’ships gold, bear Okuhara in final, 21 wins, 2 silvers, 2 bronzes, 1 gold: Sindhu completes the Worlds set, PV Sindhu gets past feisty Blichfeldt to make Swiss Open final, Badminton World C’ships: Carolina Marin Beats PV Sindhu in Final, PV Sindhu vs Carolina Marin final, BWF World C'ships 2018, highlights: Marin outwits Sindhu, Carolina Marin]","[https://www.olympicchannel.com/en/stories/news/detail/badminton-carolina-marin-eases-past-pv-sindhu-swiss-open-final/, https://www.thehindu.com/sport/other-sports/pv-sindhu-becomes-first-indian-shuttler-to-win-world-championships-gold/article29253013.ece, https://www.espn.com/badminton/story/_/id/27459172/sindhu-completes-worlds-set, https://www.espn.com/badminton/story/_/id/31015323/pv-sindhu-gets-feisty-blichfeldt-make-swiss-open-final, https://www.thequint.com/sports/badminton/live-updates-sindhu-vs-marin-final-badminton-world-championships, https://www.timesnownews.com/sports/badminton/article/pv-sindhu-vs-carolina-marin-final-bwf-world-championships-2018-live-badminton-score-updates-sindhu-seeks-elusive-gold-nanjing-china/264675, https://www.deccanchronicle.com/content/tags/carolina-marin]","[0.68, 0.44, 0.37, 0.47, 0.7, 0.72, 0.37]"


In [None]:
'''
save the news-data datafame in to pickle file 
'''
article_details_df.to_pickle('/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/news_article_with_sim_score.df')


In [None]:
'''
load the news dataset pickle file and show one sample record
'''
article_details_df=pd.read_pickle('/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/news_article_with_sim_score.df')
article_details_df.head(3)

Unnamed: 0,article_url,headline,content,author,published_date,read_more_source,similar_headline,similar_headline_url,similarity_scores
0,https://inshorts.com/en/news/you-shall-always-have-my-support-arjun-kapoor-on-janhvis-bday-1615047490059,You shall always have my support: Arjun Kapoor on Janhvi's b'day,"Taking to Instagram on Saturday, Arjun Kapoor posted a picture of himself with Janhvi Kapoor to wish the actress on her 24th birthday. In the picture, Arjun can be seen walking ahead while holding his sister's hand. ""Happy birthday Janhvi...I can't promise much except like this picture you shall always have my support & hand wherever you go,"" Arjun wrote.",,2021-03-06T16:18:10.000Z,PINKVILLA,"[On Janhvi Kapoor’s birthday, Arjun promises to be a constant support: You shall have my hand wherever you go, ""You will always have my support,"" Arjun Kapoor writes a heartfelt birthday note for Janhvi Kapoor., 'You shall always have my support': Arjun Kapoor pens heart-warming birthday note for Janhvi, Janhvi Kapoor rings in 24th birthday with her crew, sister Anshula plans huge surprise]","[https://www.pinkvilla.com/entertainment/news/janhvi-kapoor-s-birthday-arjun-promises-be-constant-support-you-shall-have-my-hand-wherever-you-go-633968, https://indianewsrepublic.com/you-will-always-have-my-support-arjun-kapoor-writes-a-heartfelt-birthday-note-for-janhvi-kapoor/222825/, https://english.lokmat.com/entertainment/you-shall-always-have-my-support-arjun-kapoor-pens-heart-warming-birthday-note-for-janhvi/, https://www.republicworld.com/entertainment-news/bollywood-news/janhvi-kapoor-rings-in-24th-birthday-with-her-crew-sister-anshula-plans-huge-surprise.html]","[0.7, 0.8, 0.8, 0.41]"
1,https://inshorts.com/en/news/gavaskar-felicitated-on-50th-anniversary-of-test-debut-1615046996400,Gavaskar felicitated on 50th anniversary of Test debut,"Ex-India captain Sunil Gavaskar was felicitated by the BCCI on the 50th anniversary of his Test debut on Saturday. The 71-year-old received a commemorative Test cap from BCCI Secretary Jay Shah, while several ex-cricketers took to Twitter to congratulate Gavaskar. ""The game is as strong as it is today because they made a start then against all odds,"" Ganguly tweeted.",,2021-03-06T16:09:56.000Z,read more source not available,"[Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated on 50th anniversary of Test debut, Sunil Gavaskar felicitated by BCCI on 50th anniversary of his Test debut, Former India captain Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, Sunil Gavaskar Felicitated By BCCI On 50th Anniversary Of Test Debut. Watch, Sunil Gavaskar felicitated by BCCI on 50th anniversary of Test debut, BCCI Felicitates Sunil Gavaskar On His 50th Anniversary Of Test Debut]","[https://www.thehindu.com/sport/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut/article34005053.ece, https://www.deccanherald.com/sports/cricket/sunil-gavaskar-felicitated-on-50th-anniversary-of-test-debut-958854.html, https://timesofindia.indiatimes.com/sports/cricket/england-in-india/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362569.cms, https://www.sportskeeda.com/cricket/sunil-gavaskar-felicitated-by-bcci-50th-anniversary-test-debut, https://m.economictimes.com/news/sports/former-india-captain-sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut/articleshow/81362570.cms, https://sports.ndtv.com/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-watch-2384926, https://www.indiatvnews.com/sports/cricket/sunil-gavaskar-felicitated-by-bcci-on-50th-anniversary-of-test-debut-689160, https://cricketaddictor.com/cricket-news/bcci-felicitates-sunil-gavaskar-on-his-50th-anniversary-of-test-debut/]","[0.95, 0.95, 0.95, 0.88, 0.86, 0.76, 0.88, 0.81]"
2,https://inshorts.com/en/news/pv-sindhu-to-face-carolina-marin-in-her-1st-final-since-2019-world-cships-win-1615046535615,PV Sindhu to face Carolina Marin in her 1st final since 2019 World C'ships win,"Shuttler PV Sindhu on Saturday defeated Mia Blichfeldt in straight games to enter the final of the Swiss Open, where she'll face Carolina Marin. Notably, this is Sindhu's first final appearance since winning the BWF World Championships in August 2019. In the men's singles, Kidambi Srikanth suffered defeat in the semifinal against world number two Viktor Axelsen of Denmark.",,2021-03-06T16:02:15.000Z,read more source not available,"[Carolina Marin overpowers PV Sindhu in Swiss Open final, Sindhu becomes first Indian shuttler to win World C’ships gold, bear Okuhara in final, 21 wins, 2 silvers, 2 bronzes, 1 gold: Sindhu completes the Worlds set, PV Sindhu gets past feisty Blichfeldt to make Swiss Open final, Badminton World C’ships: Carolina Marin Beats PV Sindhu in Final, PV Sindhu vs Carolina Marin final, BWF World C'ships 2018, highlights: Marin outwits Sindhu, Carolina Marin]","[https://www.olympicchannel.com/en/stories/news/detail/badminton-carolina-marin-eases-past-pv-sindhu-swiss-open-final/, https://www.thehindu.com/sport/other-sports/pv-sindhu-becomes-first-indian-shuttler-to-win-world-championships-gold/article29253013.ece, https://www.espn.com/badminton/story/_/id/27459172/sindhu-completes-worlds-set, https://www.espn.com/badminton/story/_/id/31015323/pv-sindhu-gets-feisty-blichfeldt-make-swiss-open-final, https://www.thequint.com/sports/badminton/live-updates-sindhu-vs-marin-final-badminton-world-championships, https://www.timesnownews.com/sports/badminton/article/pv-sindhu-vs-carolina-marin-final-bwf-world-championships-2018-live-badminton-score-updates-sindhu-seeks-elusive-gold-nanjing-china/264675, https://www.deccanchronicle.com/content/tags/carolina-marin]","[0.68, 0.44, 0.37, 0.47, 0.7, 0.72, 0.37]"


In [None]:
'''
save the news-data datafame in to excel file 
'''
article_details_df.to_excel('/content/gdrive/My Drive/capstone_project/news_url_with_headline/News_atricle_details/news_article_with_sim_score.xlsx')
