<p style="font-size:78px">Final Project IRWA (2024-2025)</p>

<p style="font-size:48px">Part 2: Indexing and Evaluation</p>

In [1]:
# Standard library imports
import os
import sys

# Third-party imports


# Local application imports
current_dir = os.path.dirname(os.path.abspath(__file__)) if '__file__' in locals() else os.getcwd()
project_root = os.path.join(current_dir, '..')
if project_root not in sys.path:
    sys.path.append(project_root)
import irwa.loading as ild 
import irwa.preprocessing as ipp
import irwa.indexing as ind
import irwa.ranking as irk

# The following lines allow for autoreload of modules. They allow changes in modules without the need to reload the kernel.
%load_ext autoreload
%autoreload 2

# 1) Indexing

In [2]:
# Loading and preprocessing
file_path = '../data/farmers-protest-tweets.json'
tweets = ild.load_tweets_from_json(file_path)
print(f"Loaded {len(tweets)} tweets")
tweet_document_ids_map_df = "../data/tweet_document_ids_map.csv"
docid_to_tweetid, token_tweets = ipp.create_tokenized_dictionary(tweets, tweet_document_ids_map_df)
print(f"Loaded {len(token_tweets)} documents with their corresponding tokenized tweet content")

Loaded 117407 tweets
Loaded 48429 documents with their corresponding tokenized tweet content


In [3]:
# Inverted Index construction
inverted_index = ind.create_inverted_index(token_tweets)

In [25]:
# Definition of test queries
query1 = ["indian", "protest"]       # Example given in handout
query2 = ["support", "farmers"]      # Example given in handout
query3 = ["delhi", "farmers"]
query4 = ["government", "corrupt"]
query5 = ["president", "india"]

#### Query 1

In [26]:
# Ranking results with TF-IDF
scores_q1 = irk.tf_idf(inverted_index, query1, token_tweets)
irk.sort_scores_tf_idf(scores_q1, docid_to_tweetid, tweets, 5)

Top 5 Results:
Document doc_13095: 13.035678814641825
Content: INDIAN FARMERS are protesting in DELHI for last 3 months. 220+ farmers had died so far in #FarmersProtest .Protests are held all over the world to show solidarity with Indian Farmers.A protest will be held in Australia this Sunday.
#DPstopIntimidatingFarmers
@UNHumanRights
@bbc https://t.co/Ct5hqEEXRE
Document doc_445: 12.575039678288194
Content: Farmers Protest | Pawri Ho Rahi Hai 🌾
Dedicated to The 2020–2021 Indian farmers' protest. #FarmersProtest​ is an ongoing protest against three farm acts which were passed by the Parliament of India in Sep 2020. Millions of farmers are protesting in India.
https://t.co/cR5ltghf6X
Document doc_5374: 11.534076978962176
Content: @VP Dear madam,
Not only Indian farmers need justice but every Indian need justice who love democracy
please save Indian democracy and Indian constitution🙏🙏🙏
#FarmersProtest
Document doc_9022: 11.534076978962176
Content: #modi_rojgar_do - indian youth.
#Farmers

#### Query 2

In [27]:
# Ranking results with TF-IDF
scores_q2 = irk.tf_idf(inverted_index, query2, token_tweets)
irk.sort_scores_tf_idf(scores_q2, docid_to_tweetid, tweets, 5)

Top 5 Results:
Document doc_36673: 10.905929444908995
Content: We Support #FarmersProtest
We Support #GretaThunberg 
We Support #Rehanna
We Support #NodeepKaur
We Support #DishaRavi

#ReleaseDishaRavi #ReleaseNovdeepKaur
Document doc_811: 6.5435576669453965
Content: #FarmersProtest Please Check Spellings carefully before you hashtag, there are some false hashtags to confuse the supporters with the aim to increase the chances of error and of course support. ਅੱਖ ਬਾਜ ਵਾਲੀ ਰੱਖਣੀ ਜੀ ਗਲਤੀ ਤੇ ਕਰਨੀ ਨੀ.. Full Support https://t.co/YVBMuY84mP
Document doc_4323: 6.5435576669453965
Content: Please support farmers ... the hands that feed the world 
If you support India, then support Indian farmers
 
🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻

#Pagdi_Sambhal_Jatta
#FarmersProtest
1 https://t.co/CqJKsxizWf
Document doc_4325: 6.5435576669453965
Content: #Pagdi_Sambhal_Jatta
🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳
🚜🚜🚜🚜🚜🚜🚜🚜🚜
Please support farmers ... the hands that feed the world 
If you support India, then support Indian farmers
 
🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻

#Pagdi_Samb

#### Query 3

In [28]:
# Ranking results with TF-IDF
scores_q3 = irk.tf_idf(inverted_index, query3, token_tweets)
irk.sort_scores_tf_idf(scores_q3, docid_to_tweetid, tweets, 5)

Top 5 Results:
Document doc_941: 10.344949174253891
Content: @anilca95 @ArvindKejriwal Our honorable CM is busy with farmers from outside and handed over DELHI to #FarmersProtest He has no time or attention to problems of Delhi, yamuna continues to suffer, pollute and no govt has any policy or plan to #save yamuna! Sic of you Delhi political circles
Document doc_3218: 10.344949174253891
Content: @ANI Lakha Sidhana at Mehraj rally. During rally Harjit Dhapali said that if Delhi Police arrests Lakha Sadana, a protest march will be taken against Delhi police in Delhi. #FarmersProtest
Document doc_3224: 10.344949174253891
Content: Lakha Sidhana at Mehraj rally. During rally Harjit Dhapali said that if Delhi Police arrests Lakha Sadana, a protest march will be taken against Delhi police in Delhi. #FarmersProtest https://t.co/AkAH2K5YK6
Document doc_3227: 10.344949174253891
Content: Lakha Sidhana at Mehraj rally. During rally Harjit Dhapali said that if Delhi Police arrests Lakha Sadana, a p

#### Query 4

In [29]:
# Ranking results with TF-IDF
scores_q4 = irk.tf_idf(inverted_index, query4, token_tweets)
irk.sort_scores_tf_idf(scores_q4,docid_to_tweetid, tweets, 5)

Top 5 Results:
Document doc_37665: 14.465122043105842
Content: Good news for Indian, bad news for Fake #FarmersProtest corrupt #DhruvRathee corrupt #BarkhaDutt corrupt @ndtv Antinational #Sikh #Khalistanis https://t.co/FUeSUyjII8
Document doc_27629: 10.78785408682564
Content: @newslaundry @NidhiSuresh_ #Toolkit had nothing to do with real farmers. It was for anti nationals, anto Modi, anti governmental people and pro #khalistanis to create riots in our country in the name of #FarmersProtest
Document doc_5022: 9.643414695403894
Content: No Rules No Worries , They Own The Banks Control Police  Control Elections, Control Supreme remember them jokers.  Abuse of power All Corruption Corruption. Sad Days In India. But Farmers will not be your Slaves 😡
#ModiIgnoringFarmersDeaths 
#FarmersProtest https://t.co/8SIWAnu4D3
Document doc_5856: 9.643414695403894
Content: Corruption in the Voting System. Candidates will be scared to Run BJP, Electronic Corruption with Voting system. This is All Viola

#### Query 5

In [30]:
# Ranking results with TF-IDF
scores_q5 = irk.tf_idf(inverted_index, query5, token_tweets)
irk.sort_scores_tf_idf(scores_q5, docid_to_tweetid, tweets, 5)

Top 5 Results:
Document doc_30305: 13.060837919626664
Content: India doesn't give a shit about minorities. India doesn't give a shit about minorities. India doesn't give a shit about minorities. India doesn't give a shit about minorities. India doesn't give a shit about minorities. India doesn't give a shit about minorities. #FarmersProtest
Document doc_26023: 10.884031599688885
Content: mandeeptoronto: #FarmersMakeIndia

Farmers are the Backbone of India
Farmers are the Blood of India
Farmers make the Food of India
Farmers are the People of India
Farmers are India

#FarmersProtest https://t.co/Wx7GsuMPYr https://t.co/EnHIwtiLml
Document doc_22942: 8.707225279751109
Content: @ZeeNewsEnglish Sir, you are spoiling India's image. India is not BJP or Modiji. India is India because of our great democracy and constitution. By protecting the wrongdoings of govt &amp; spreading fake news &amp; propoganda you are spoiling our image 🇮🇳

#FarmersProtest
#GodiMedia
Document doc_23630: 8.7072252797