# Udacity Custom Chatbot Project 2025

## Step 1 - Preparing the Dataset

### Reason for selecting the '2024 in India' Wikipedia page (https://en.wikipedia.org/wiki/2024_in_India) is that in that Year India has grown vibrantly in Finance, Economy, Research, Education and in many other sectors and achieved a new everest at Global Level. Also at study point of view it might be helpful for scholars.

In [1]:
import os

In [1]:
import requests

import pandas as pd
import re
from dateutil.parser import parse

import warnings
warnings.filterwarnings('ignore')

In [3]:
response = requests.get("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=1&titles=2024_in_India&explaintext=1&formatversion=2&format=json")
response

<Response [200]>

In [4]:
response.json()['query']['pages'][0]['extract'].split("\n")

['The following is a list of events for the year 2024 in India.',
 '',
 '',
 '== Incumbents ==',
 '',
 '',
 '=== National government ===',
 '',
 '',
 '=== State governments ===',
 '',
 '',
 '== Events ==',
 '',
 '',
 '=== January ===',
 '1 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.',
 "2 January - 2023–2024 Indian truckers' protests: Protests are organized by Indian truckers against the severity of the newly proposed law in dealing with the hit-and-run cases.",
 '3 January -',
 'A court in Jaunpur sentences two men to death over the 2005 Jaunpur train bombing which killed 14 people.',
 'A bus carrying 45 passengers collides with a truck in Golaghat district in Assam killing 12 and injuring 30 others.',
 '5 January - 2024 Sandeshkhali violence: A team of ED officers are injured in clashes with local supporters of Shahjahan Sheikh in Sandeshkhali, West Bengal.',
 '6 January -',
 "ISRO's Ad

In [5]:
df_India_2024 = pd.DataFrame()
df_India_2024

In [6]:
df_India_2024['text'] = response.json()['query']['pages'][0]['extract'].split("\n")
df_India_2024

Unnamed: 0,text
0,The following is a list of events for the year...
1,
2,
3,== Incumbents ==
4,
...,...
660,== Notes ==
661,
662,
663,== External links ==


In [7]:
# Removing null values or keeping records will values in test column

df_India_2024 = df_India_2024[df_India_2024["text"].str.len() > 0]
df_India_2024

Unnamed: 0,text
0,The following is a list of events for the year...
3,== Incumbents ==
6,=== National government ===
9,=== State governments ===
12,== Events ==
...,...
654,"Satish Pradhan, 84, politician"
657,== References ==
660,== Notes ==
663,== External links ==


In [8]:
# Removing header values within column

df_India_2024 = df_India_2024[~df_India_2024["text"].str.startswith("==")]
df_India_2024

Unnamed: 0,text
0,The following is a list of events for the year...
16,1 January - ISRO successfully launches its fir...
17,2 January - 2023–2024 Indian truckers' protest...
18,3 January -
19,A court in Jaunpur sentences two men to death ...
...,...
651,"26 December- Manmohan Singh, 92, politician an..."
652,29 December-
653,"Kishore Kunal, 74, police officer"
654,"Satish Pradhan, 84, politician"


In [9]:
df_India_2024.head(20)

Unnamed: 0,text
0,The following is a list of events for the year...
16,1 January - ISRO successfully launches its fir...
17,2 January - 2023–2024 Indian truckers' protest...
18,3 January -
19,A court in Jaunpur sentences two men to death ...
20,A bus carrying 45 passengers collides with a t...
21,5 January - 2024 Sandeshkhali violence: A team...
22,6 January -
23,ISRO's Aditya-L1 spacecraft on India's first s...
24,Three Maldives government ministers (Maryam Sh...


In [10]:
df_India_2024.drop(index=df_India_2024.index[0], inplace=True)
df_India_2024

Unnamed: 0,text
16,1 January - ISRO successfully launches its fir...
17,2 January - 2023–2024 Indian truckers' protest...
18,3 January -
19,A court in Jaunpur sentences two men to death ...
20,A bus carrying 45 passengers collides with a t...
...,...
651,"26 December- Manmohan Singh, 92, politician an..."
652,29 December-
653,"Kishore Kunal, 74, police officer"
654,"Satish Pradhan, 84, politician"


In [11]:
df_India_2024.head(30)

Unnamed: 0,text
16,1 January - ISRO successfully launches its fir...
17,2 January - 2023–2024 Indian truckers' protest...
18,3 January -
19,A court in Jaunpur sentences two men to death ...
20,A bus carrying 45 passengers collides with a t...
21,5 January - 2024 Sandeshkhali violence: A team...
22,6 January -
23,ISRO's Aditya-L1 spacecraft on India's first s...
24,Three Maldives government ministers (Maryam Sh...
25,7 January - Tribal protests are held in Hasdeo...


In [12]:
text = """Events[edit]January[edit]
1 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.[1]
2 January - 2023–2024 Indian truckers' protests: Protests are organized by Indian truckers against the severity of the newly proposed law in dealing with the hit-and-run cases.[2]
3 January -A court in Jaunpur sentences two men to death over the 2005 Jaunpur train bombing which killed 14 people.[3]
A bus carrying 45 passengers collides with a truck in Golaghat district in Assam killing 12 and injuring 30 others.[4]
5 January - 2024 Sandeshkhali violence: A team of ED officers are injured in clashes with local supporters of Shahjahan Sheikh in Sandeshkhali, West Bengal.
6 January -ISRO's Aditya-L1 spacecraft on India's first solar mission, successfully enters its final orbit around the first Sun-Earth Lagrangian point (L1), approximately 1.5 million kilometers from the earth.[5]
Three Maldives government ministers (Maryam Shiuna, Malsha Shareef and Mahzoom Majid) make derogatory remarks against the Indian Prime Minister on social media, causing massive backlash and tourism boycott in India, and triggering the 2024 India-Maldives diplomatic row, which forced the Maldivian government to suspend the ministers.[6][7]
7 January - Tribal protests are held in Hasdeo Arand, Chhattisgarh against the felling of four lakh trees at the Parsa Coal Mines, operated by Adani Group and Rajasthan Rajya Vidyut Utpadan Nigam.[8]
12 January - Prime Minister Modi inaugurates Mumbai Trans Harbour Link, the longest bridge in India that connects Mumbai with Navi Mumbai.[9][10]
14 January - Rahul Gandhi commences his Bharat Jodo Nyay Yatra from Imphal, Manipur.[11]
22 January - Ram Mandir Prana Pratishtha: The Ram Mandir at Ayodhya in Uttar Pradesh is inaugurated by Prime Minister Narendra Modi.[12]
25 January - French President Emmanuel Macron arrives in Jaipur as part of his two-day state visit to India and holds a joint roadshow with Indian Prime Minister Modi from Jantar Mantar to Sanganeri gate.[13]
26 January - India's 75th Republic Day is celebrated with French President Macron participating as the Chief Guest.[14]
28 January - 2024 Bihar political crisis: Chief Minister of Bihar Nitish Kumar resigns, ending the coalition with the INDIA bloc and takes oath as the Chief Minister for the ninth time by joining the Bharatiya Janata Party (BJP)-led NDA.[15]
29 January - An 18 year old JEE aspirant dies by suicide in Kota, Rajasthan leaving behind a suicide note.[16][17]
30 January -Three Central Reserve Police Force (CRPF) personnel are killed and 14 others injured in a Naxal attack near Tekalgudem village in Chhattisgarh.[18]
The historic Masjid Akhonji in Mehrauli, Delhi, is demolished by the Delhi Development Authority.[19][20]
31 January - Jharkhand Chief Minister Hemant Soren is arrested by the Enforcement Directorate (ED) in connection with a money laundering case linked to a land scam.[21]
February[edit]
2 February - Champai Soren takes oath as the new Chief Minister of Jharkhand following the arrest of Hemant Soren.[22]
6 February -An explosion at a firecracker factory in Harda, Madhya Pradesh, leaves 11 dead, over 100 injured and destroys 60 nearby houses, leading to the evacuation of over 100 houses.[23]
The Election Commission of India recognises Ajit Pawar's faction as the official Nationalist Congress Party (NCP) and grants his faction the party's poll symbol and name, while directing Sharad Pawar's camp to take a new name.[24]
7 February - The Uttarakhand Legislative Assembly passes the Uniform Civil Code (UCC) bill 2024, making Uttarakhand the first state in India to pass a law on the same.[25]
8 February - 2024 Haldwani violence: Four people are killed and over 100 are injured after clashes break out between rioters and police in Haldwani, Uttarakhand after the demolition of an illegal madrasa following a court order, prompting the suspension of internet services, closure of schools, and issuance of shoot-at-sight orders against the rioters.[26][27]
12 February -Seven of the eight Indian Navy veterans who were released after being initially sentenced to death on espionage charges in Qatar, reach India.[28][29]
2024 Indian farmers' protest: Second round of protests and road blockades by farmers begin in the northern states of Punjab and Haryana, demanding Minimum Support Price and waiver of agricultural loans.[30]
15 February -A unanimous judgement by the constitutional bench of the Supreme Court of India strikes down the Electoral bond scheme, stating it to be unconstitutional, arbitrary and violation of right to information under article 19 of the Constitution.[31]
A fire breaks out at a paint factory in Alipur, Delhi, resulting in 11 deaths and 4 injuries.[32]
17 February -ISRO launches the INSAT-3DS meteorological satellite.[33][34]
2024 Virudhunagar explosions: A firecracker factory explosion in Virudhunagar district, Tamil Nadu, leaves 10 dead and seven injured.[35]
23 February – 2024 Indian farmers' protest: A farmer dies amidst the ongoing protests, raising the death toll to five.[36]
24 February – A tractor carrying a wagon loaded with Hindu pilgrims overturns and falls into a pond in Kasganj District, Uttar Pradesh, killing 23 people and injuring nine.[37]
25 February – Former MLA and then INLD state president Nafe Singh Rathee is shot and killed in Bahadurgarh, Jhajjar district, Haryana.[38]
27 February - 2024 Rajya Sabha elections: Elections are held to elect 65 of the 245 members of Rajya Sabha with the BJP winning 32 seats, two more than what it held previously at the expense of Indian National Congress, whose candidates lost in Himachal Pradesh and Uttar Pradesh due to cross voting of legislators from the Congress and the Samajwadi Party.[39]
March[edit]
1 March – 2024 Bangalore Cafe bombing: At least eight people are injured after an improvised explosive device explodes in a café in Bengaluru.[40]
9 March –Election Commissioner Arun Goel resigns ahead of the general elections.[41]
The Miss World 2023 pageant is held in Mumbai.[42]
11 March – The union government announces the implementation of the Citizenship Amendment Act, which enables persecuted minority communities from select religions (including Hindus, Sikhs, Jains, Christians, Buddhists, and Parsis) from India's neighboring countries Pakistan, Afghanistan and Bangladesh to acquire Indian citizenship.[43]
12 March – The State Bank of India submits the data about electoral bonds before the Election Commission after the Supreme Court dismissed its plea seeking an extension up to 30 June for disclosing details about electoral bonds.[44][45]
14 March – The Election Commission publishes data on electoral bonds submitted by the State Bank.[46]
16 March – Chief Election Commissioner Rajiv Kumar announces the schedule for the upcoming general elections.[47]
19 March – Two children are attacked and killed in Budaun with the suspect killed in an encounter by the Uttar Pradesh Police.[48]
21 March – Arrest of Arvind Kejriwal: Delhi Chief Minister Arvind Kejriwal is arrested by the Enforcement Directorate (ED) in connection with the Delhi excise policy case.[49]
26 March –Sonam Wangchuk ends his 21-day hunger strike demanding statehood for Ladakh.[50]
Two Chinese nationals are arrested by the Indian Police near the India–Nepal border after they were intercepted entering Uttar Pradesh illegally.[51]
April[edit]
3 April – Seven people, including two children are killed in a fire at a tailoring shop in Aurangabad.[52]
12 April – Two suspects in the March café bombing in Bengaluru are arrested in Kolkata.[53]
14 April – The BJP launches its manifesto for the 2024 Lok Sabha election titled ‘Sankalp Patra’ (Resolution Letter).[54]
16 April –At least six people are killed and three others are reported missing after a boat capsizes along the Jhelum River in Srinagar.[55]
2024 Kanker clash: Twenty-nine Naxalites are killed and three members of the security forces are injured during a police raid in Kanker District, Chhattisgarh.[56]
19 April–1 June - 2024 Indian general election: First of the seven phases of polling begins to elect members to the 18th Lok Sabha with the election expected to be the largest in history.[57]
19 April –2024 Arunachal Pradesh Legislative Assembly election
2024 Sikkim Legislative Assembly election
A restaurant fire that spreads to an adjacent hotel in Patna kills six people and injures 20.[58]
29 April – 2024 Indian heat wave: Two more people die in South India as a heatwave sweeps across India.[59]
May[edit]
1 May – 2024 Indian bomb hoaxes: About 150 schools in Delhi receive bomb threats, prompting evacuation and closure of schools in the region.[60][61]
4 May – Minister of External Affairs S. Jaishankar rejects comments made by US President Joe Biden saying that India's economic growth was being held back by xenophobia.[62]
9 May – Air India Express cancels more than 85 flights due to staff calling in sick, linked to a protest against working conditions imposed by the new owner Tata Group.[63]
10 May –The Supreme Court grants Arvind Kejriwal interim bail in connection with the Delhi liquor policy money laundering case thus permitting him to campaign in general elections.[64]
Security forces kill 12 Maoists in Chhattisgarh.[65]
13 May - 1 June – 2024 Odisha Legislative Assembly election
13 May –Mumbai hoarding collapse: 17 people are killed and 74 others are injured after an illegal hoarding collapses onto a gas station during a storm in the Ghatkopar suburb of Mumbai.[66]
2024 Andhra Pradesh Legislative Assembly election.
19 May – 2024 Pune Porsche car crash: A high speeding Porsche Taycan car driven by Vedant Agarwal under the influence of alcohol kills two software engineers in Kalyani Nagar, Pune.
21 May –The government declares a one-day state mourning in India for Iranian President Ebrahim Raisi following his death in a helicopter crash.[67]
Charges of sexual harassment are filed against former Wrestling Federation of India president Brij Bhushan Sharan Singh in a Delhi court.[68]
22 May –2024 Indian bomb hoaxes: The North Block building of the Central Secretariat which houses the Home Ministry receives a bomb threat email that is subsequently dismissed as a hoax.[69]
The Calcutta High Court issues an order invalidating all Other Backward Classes (OBC) certificates issued by the Government of West Bengal since 2010.[70]
23 May – 2024 Thane explosion: At least nine people are killed and 64 others injured after a fire caused by an exploding boiler breaks out at a chemical factory in Dombivli, outside Thane.[71][72]
25 May -2024 Rajkot gaming zone fire: At least 33 people are killed including nine children in a fire at a gaming arcade in Rajkot, Gujarat.[73][74]
At least seven infants are killed in a fire in a baby care facility at a hospital in Shahdara, East Delhi.[75]
26 May – Twelve people are killed during heavy rains as Cyclone Remal makes landfall at West Bengal.[76]
28 May – Seventeen people are killed and 12 others are reported missing after a stone quarry collapses due to heavy rains caused by cyclone Remal in Melthum, Mizoram.[77]
29 May – 2024 Indian heat wave: The India Meteorological Department records a maximum temperature of 52.3 °C (126.1 °F) at its station in the Mungeshpur area of Delhi, the highest ever temperature recorded in the city.[78]
30 May -A bus carrying Hindu pilgrims falls into a gorge in Jammu and Kashmir, killing 21 people and injuring 35 others.[79]
2024 Indian heat wave: At least 15 people die in a heatwave that affects northern and central India.[80]
31 May – Prajwal Revanna, Member of Parliament from Karnataka representing the Janata Dal (Secular), is arrested in Bengaluru over sexual assault charges.[81]
June[edit]
4 June –Results are declared for the 2024 Indian general election.[82]
2024 NEET controversy : A controversy erupts due to the sudden declaration of the NEET-UG results 10 days earlier than the originally scheduled date amid allegations of irregularities and paper leaks.[83]
President Droupadi Murmu dissolves the 17th Lok Sabha.[84]
6 June – 2024 Uttarakhand snowstorm disaster: A blizzard in Uttarakhand kills nine trekkers from Karnataka.[85]
9 June –2024 Reasi attack: Nine people are killed and 33 others are injured after a bus carrying Hindu pilgrims is attacked by Lashkar-e-Taiba militants near Reasi, Jammu and Kashmir.[86]
Narendra Modi takes his oath of office as Prime Minister of India for the third time.[87]
11 June - Kannada cinema actor Darshan is arrested by Karnataka Police in connection with a murder case.[88]
12 June – The Indian external affairs ministry issues a notice urging the Russian Government to quickly return all Indian nationals who are serving in the Russian army after two Indians recruited are killed in the war with Ukraine.[89]
14 June –2024 Northeast India floods: Heavy rains and landslides kill six people across Sikkim.[90]
The World Health Organization officially confirms a human Influenza A virus subtype H9N2 case in a child in West Bengal.[91]
17 June – 2024 West Bengal train collision: A goods train collides with the Kanchenjunga Express near New Jalpaiguri station in Darjeeling District, West Bengal, killing 15 people and injuring 60 others.[92]
19 June – 2024 Northeast India floods: Six people are killed in flooding and landslides in Assam.[93]
20 June – 2024 Tamil Nadu alcohol poisoning: At least 47 people are reported killed after suffering methanol poisoning caused by tainted liquor in Kallakurichi district in Tamil Nadu.[94][95]
21 June – A Valentine's Day film released.
22 June – Subodh Kumar Singh is dismissed as the director general of the National Testing Agency following uproar over the 2024 NEET controversy.[96]
27 June – President Droupadi Murmu inaugurates the 18th Lok Sabha.[97]
28 June – One person is killed after a roof at Terminal 1 of Indira Gandhi International Airport in Delhi collapses amid heavy rains.[98]
29 June –Five soldiers are killed after an Indian Army tank sinks while crossing the Shyok River in Saser Brangsa, Ladakh.[99]
2024 Virudhunagar explosions: An explosion firecracker factory in Virudhunagar district in Tamil Nadu leaves four dead and one injured.[100]
India beat South Africa and wins the 2024 ICC Men's T20 World Cup.
30 June – 2024 Lonavala waterfall tragedy: Five people, including four children, are swept away by sudden flooding near a waterfall close to Bhushi Dam in Lonavala, Maharashtra.[101]
July[edit]
1 July – The Bharatiya Nyaya Sanhita and two other laws passed in 2023 come into effect as the country's criminal codes, replacing the Indian Penal Code and related laws enacted during the colonial era.[102]
2 July – 2024 Uttar Pradesh crowd crush: At least 123 people die in a crowd crush at a religious event in Rati Bhanpur village within the Sikandra Rau area of Hathras district, Uttar Pradesh.[103]
2–3 July – 2024 India-Bangladesh floods: At least sixteen people are killed by floods and landslides in Assam and Arunachal Pradesh, while over 300,000 more are displaced.[104][105]
5 July – A victory parade is held in Mumbai by the Indian cricket team following their victory in the 2024 ICC Men's T20 World Cup.[106][107]
6 July – 2024 Surat building collapse: At least seven people are killed and more than 15 others injured in the collapse of a residential building in Surat, Gujarat.[108][109]
7 July – Two soldiers and six militants are killed during two separate encounters in Kulgam district, Jammu and Kashmir.[110][111]
8 July – Five soldiers are killed in an ambush by militants in Kathua district, Jammu and Kashmir.[112]
10 July – Eighteen people are killed after a double-decker bus collides with a milk truck in Uttar Pradesh.[113]
12 July – The wedding of Anant Ambani, the second son of Asia's richest-man Mukesh Ambani, and heiress Radhika Merchant is held in Mumbai in a lavish ceremony attended by international VIPs.[114]
•14 July - ISRO successfully launched chandrayan III dated on 14 July 2024* 15 July – Four soldiers including an officer are killed in an ambush by militants in Doda district, Jammu and Kashmir.[115]
18 July –Twelve coaches of the Dibrugarh–Chandigarh Express train derail in Gonda, Uttar Pradesh, resulting in at least four dead and 25 injured.[116][117]
Naxalite–Maoist insurgency: Two soldiers are killed and four others injured in an attack by Maoists in Bijapur district, Chhattisgarh.[118]
19 July – WazirX, an Indian cryptocurrency exchange owned by Binance, announces a security breach in which US$234 million in cryptocurrency was stolen, amounting to half of the platform's total assets.[119]
26 July –The Charaideo maidam tomb complex in Assam is designated as a World Heritage Site by UNESCO.[120]
China and India agree to cooperate in withdrawing all their troops from their disputed border, with aims of peacefully achieving "complete disengagement" from the border conflict as quickly as possible.[121]
27 July – 3 UPSC aspirants die due to excessive flooding of the basement of the Rau IAS coaching centre in Rajendra Nagar, Delhi.[122][123]
28 July – Manu Bhaker wins a bronze medal in the 10 meter air pistol the 2024 Summer Olympics in Paris.
30 July –2024 Wayanad landslides – At least 231 people are killed, 397 injured and 118 missing following landslides in Wayanad district, Kerala.[a][127]
A train derails near Barabamboo, Jharkhand, killing two people and injuring 20 others.[128]
Manu Bhaker and Sarabjot Singh win bronze medals in Mixed 10 metre air pistol team at the 2024 Summer Olympics in Paris.[129]
August[edit]
1 August –Eleven people are killed by heavy downpours and flooding in Delhi and North India and over 250 people are declared missing in and around the Himalayas, with rainfall reaching 183 mm (7 inches) in some regions.[130]
China and India conduct the 30th round of talks in New Delhi, to resolve the ongoing border disputes, by agreeing to speed up negotiations over the border disputes and to maintain peace and tranquility in border regions.[131]
Swapnil Kusale wins a bronze medal in Men's 50 meter rifle three positions at the 2024 Summer Olympics in Paris.[132]
8 August –The India men's national field hockey team wins a bronze medal at the 2024 Summer Olympics in Paris.[133]
Neeraj Chopra wins a silver medal in Javelin throw at the 2024 Summer Olympics in Paris.[134]
9 August –Aman Sehrawat wins a bronze medal in Men's freestyle 57 kilograms at the 2024 Summer Olympics in Paris.[135]
2024 Kolkata rape and murder: The rape and murder of a female 31 year old post-graduate trainee doctor at R. G. Kar Medical College and Hospital in Kolkata triggers outrage and protests across the country.[136][137][138]
12 August – At least seven people are killed and 10 others are injured in a crowd crush believed to have been caused by a clash between a flower vendor and Hindu worshippers at the Baba Siddhnath Temple in Jehanabad District, Bihar.[139]
21 August – At least 18 people are killed and 37 others are injured in an explosion at a pharmaceutical factory in Anakapalli district, Andhra Pradesh.[140]
23 August –At least 19 people are reported killed following days of flooding in Tripura.[141]
During a meeting in Kyiv, Narendra Modi urges Ukrainian President Volodymyr Zelenskyy to end the Russo-Ukrainian War, and volunteers to act as a mediator in talks between Zelenskyy and Russia. In a later national address, Zelenskyy thanks Modi but states that it is necessary for India to respect international law as well as Ukraine's territorial integrity and sovereignty.[142]
29 August – At least 28 people are reported killed following days of flooding in Gujarat.[143]
September[edit]
4 September – The 2024 Tripura Peace Accord is signed between the National Liberation Front of Tripura and the All Tripura Tiger Force on one side and the State Government of Tripura and the Government of India on the other, ending the 35-year old Insurgency in Tripura.[144][145]
7 September –2023–2025 Manipur violence: At least five people are killed in clashes between the Kuki and Meitei communities in Jiribam District, Manipur.[146]
At least eight people are killed in a building collapse in Lucknow.[147]
The UPSC dismisses and cancels the candidature of trainee IAS officer Puja Khedkar due to her fraulent methods of clearing the UPSC exam.[148][149]
10 September – A five-day curfew and internet blackout is imposed in Manipur due to interethnic violence.[150]
12 September – The Ministry of External Affairs says that about 45 Indian nationals have been discharged from the Russian army and efforts are under way to get a further 50 Indians released.[151]
13 September – Two soldiers are killed in clashes with separatists in Kishtwar, Jammu and Kashmir.[152]
15 September – Arvind Kejriwal announces his resignation as Chief Minister of Delhi effective 17 September.[153] He is succeeded by Atishi Marlena Singh.[154]
17 September – The Supreme Court of India quashes criminal proceedings against 30 soldiers accused of orchestrating the 2021 Nagaland killings, citing the lack of government approval for prosecution.[155]
18 September–6 October – 2024 Jammu and Kashmir Legislative Assembly election:[156] An alliance between the Congress party and the Jammu & Kashmir National Conference wins 48 of 90 seats in the state chamber.[157]
19 September -Ernst & Young employee Anna Sebastian Perayil working at Pune dies due to work pressure in July 2024.[158][159]
Laboratory tests reveal the usage of beef and other animal fats with fish oil in the preparations of Tirupati laddu sweets offered at the Tirupati Temple in Andhra Pradesh,[160] sparking political uproar in the state.[161]
20 September – At least 26 people are killed in West Bengal following a week of flooding blamed by state officials on dams being opened in neighbouring Jharkhand state.[162]
24 September – The first case of clade 1b mpox in India is discovered in a 38-year-old patient in Malappuram District, Kerala who had recently travelled to the United Arab Emirates.[163]
26 September –2024 Jivitputrika tragedy: At least 46 people taking part in Jivitputrika festivities drown in rivers and bodies of water swollen by flooding across Bihar . Thirty-seven of the victims are children.[164]
Up to 275 millimeters (11 inches) of rain falls across Mumbai, causing at least four deaths.[165]
27 September – A Class 2 student is killed in a "tantric sacrificial rite" performed by the owners of his school in Hathras district, Uttar Pradesh for the "prosperity" of the institute.[166]
October[edit]
4 October – A truck collides with a tractor trolley in Bhadohi district, Uttar Pradesh, killing 10 and injuring three.[167]
5 October –2024 Abujhmarh clash: More than 36 Maoists are killed in an operation carried out by District Reserve Guards and Special Task Force in the Abujhmarh forest area in Chhattisgarh.[168]
2024 Haryana Legislative Assembly election: The BJP retains control of the state Legislative Assembly for a record third term, winning 48 of 90 seats.[169][170]
6 October - At least 5 people are killed and many others are hospitalized due to dehydration caused in a crowd crush in the celebration of the 92nd anniversary of the Indian Air Force in Chennai, Tamil Nadu.[171]
11 October- 2024 Tamil Nadu train collision: : A Mysuru Dharbhanga train collides with a freight train on a loop line in Chennai, injuring 19 people.[172]
12 October – Former Maharashtra state minister Baba Siddique is shot dead in Mumbai.[173]
14 October – India recalls its high commissioner to Canada Sanjay Kumar Verma in retaliation to Ottawa placing him and other Indian diplomats under investigation over the murder of Canadian national and Sikh separatist Hardeep Singh Nijjar in 2023.[174] It also orders the expulsion of Canada's acting high commissioner Stewart Ross Wheeler and five other diplomats.[175]
18 October – A migrant worker from Bihar is found shot dead by suspected militants in Shopian District, Jammu and Kashmir.[176]
20 October – Seven workers on a tunnel project are killed in an attack by militants on their camp in Gagangir, Jammu and Kashmir.[176]
21 October – India announces an agreement with China regarding military patrols along the Line of Actual Control between their countries.[177]
25 October – At least two people are killed after Cyclone Dana makes landfall in Odisha.[178]
27 October - 2024 Mumbai stampede :A stampede occurs at the Bandra Terminus railway station in Mumbai, injuring nine people, two of them critically.[179][180]
28 October –The first private military aircraft manufacturing facility in India is inaugurated in Vadodara as part of a joint venture between Airbus and Tata Advanced Systems.[181]
At least 150 people are injured after an explosion caused by a fire at a fireworks storage facility near the Veerarkavu temple in Kasargod, Kerala.[182]
29–31 October – At least ten elephants die from suspected fungus poisoning from consuming contaminated millet in Bandhavgarh National Park, Madhya Pradesh.[183]
November[edit]
3 November – Nine people are injured in a grenade attack on a market in Srinagar.[184]
3 November - The India national cricket team are defeated for the first time at home in a test series by the New Zealand national cricket team.[185]
4 November –2024 Almora bus accident : A bus falls off a gorge in Marchula, Uttarakhand, killing 36 people and injuring 27 others.[186]
6 November – The Jammu and Kashmir Legislative Assembly passes a resolution calling on the union government to restore semi-autonomous status for the region.[187]
11 November –Vistara ceases operations after nine years in operation following a merger agreement with Air India.[188]
Ten members of the Hmar people are killed in disputed circumstances by paramilitary forces near Jiribam, Manipur.[189]
13–20 November – 2024 Jharkhand Legislative Assembly election: The Jharkhand Mukti Morcha-led alliance wins a majority in the Jharkhand Legislative Assembly.[190]
13 November – The Supreme Court outlaws the government's practice of outright demolishing the homes of people accused of criminal offences.[191]
15 November – At least ten infants are killed in a fire at the neonatal ward of a hospital in Jhansi.[192]
16 November – India successfully launches its first hypersonic missile at a test site in Abdul Kalam Island.[193]
20 November – 2024 Maharashtra Legislative Assembly election: The BJP-led Maha Yuti alliance wins a landslide victory in the Maharashtra Legislative Assembly.[194]
22 November – An Indian Navy submarine collides with a fishing vessel off the coast of Goa, leaving two fishermen missing.[195]
24 November –2024 Sambhal violence: Four people are killed following clashes over a court-monitored survey on a mosque in Sambhal, Uttar Pradesh.[196]
26 November – Thirteen-year old Vaibhav Suryavanshi becomes the youngest person to sign a contract to play in the Indian Premier League after joining the Rajasthan Royals for 11 million rupees.[197]
30 November – Cyclone Fengal makes landfall in Tamil Nadu, killing three people.[198]
December[edit]
1 December – At least one person is killed and three others are injured in the collapse of an under-construction tunnel of the Delhi–Mumbai Expressway in Kota, Rajasthan.[199][200]
3 December – The Bangladeshi consulate in Agartala is stormed by protesters demonstrating against the arrest of Hindu community leader Chinmoy Krishna Das in Chittagong.[201]
4 December – A ban on the public consumption of beef comes into effect in Assam.[202]
9 December – A bus ploughs through a crowd at a market in Mumbai, killing nine and injuring 29.[203]
9 December – The suicide of Atul Subhash in Bengaluru generates discussion over the effects of dowry laws on men.[204]
11 December – Sanjay Malhotra is inaugurated as governor of the Reserve Bank of India.[205]
12 December – Eighteen-year old Gukesh Dommaraju of Chennai becomes the youngest person to win the World Chess Championship.[206]
13–14 December – Actor Allu Arjun is arrested overnight on charges involving the death of a woman at a stampede during a film event in Hyderabad.[207]
18 December –A bill calling for the synchronisation of state and general elections fails to pass in the Lok Sabha.[208]
2024 Mumbai boat accident: An Indian Navy speedboat collides with a ferry traveling from Mumbai to Elephanta Island, killing 13 people.[209]
19 December – Five suspected militants are killed in an encounter with security forces in Kulgam district, Jammu and Kashmir.[210]
23 December – The Delhi High Court denies anticipatory bail to trainee IAS officer Puja Khedkar on using fraulent methods to clear the UPSC exam.[211]
25 December – A female engineering student is sexually assaulted by two men inside the Anna University in Chennai.[212]
"""


In [13]:
# Create a DataFrame from the text
df_India_2024 = pd.DataFrame({'text': text.strip().split('\n')})

processed_rows = []
current_date_info = "" # Stores the formatted date like "01 January"
current_year = "2024" # Assuming all events are in 2024 based on the data

for index, row_text in df_India_2024.iterrows():
    row = row_text['text'].strip()

    # Skip lines that are just section headers or empty
    if not row or re.match(r'^[A-Za-z]+\[edit\]$', row):
        continue

    # Attempt to extract a date from the beginning of the line
    # Regex to capture patterns like "DD Month -", "DD–DD Month -", "DD Month –"
    date_match = re.match(r'^(\d{1,2}(?:–\d{1,2})? [A-Za-z]+)\s*[-–]\s*(.*)', row)
    
    if date_match:
        date_part_raw = date_match.group(1).strip()
        event_part = date_match.group(2).strip()

        # Handle bullet points that are part of the date line (like •14 July - ...)
        if date_part_raw.startswith(('•', '*')):
            date_part_raw = date_part_raw[1:].strip()

        # Try to parse the date part. For date ranges, use the first date for grouping.
        try:
            # Add current year to make parsing more robust if year isn't explicitly there
            date_to_parse = f"{date_part_raw} {current_year}" if not current_year in date_part_raw else date_part_raw
            parsed_date = parse(date_to_parse, fuzzy=True)
            current_date_info = parsed_date.strftime("%d %B") # Format to "DD Month"

            # Remove ALL content within square brackets from the event part
            cleaned_event_part = re.sub(r'\[[^\]]*\]', '', event_part).strip()
            processed_rows.append(f"{current_date_info} - {cleaned_event_part}")
        except ValueError:
            # If date parsing fails, treat it as a continuation of the previous event
            if current_date_info:
                # Remove ALL content within square brackets from the continuation line
                cleaned_row = re.sub(r'\[[^\]]*\]', '', row).strip()
                processed_rows.append(f"{current_date_info} - {cleaned_row}")
            else:
                # If no current date is set, this line is unparseable and has no prior context
                pass
    else:
        # If the line does not start with a date pattern, it's a continuation of the previous event
        if current_date_info:
            # Remove ALL content within square brackets from the continuation line
            cleaned_row = re.sub(r'\[[^\]]*\]', '', row).strip()
            processed_rows.append(f"{current_date_info} - {cleaned_row}")
        else:
            # This handles initial lines that are not dates and have no preceding date
            pass

df_India_2024_cleaned = pd.DataFrame({'text': processed_rows})

# Print the DataFrame with the desired format
for i, row in df_India_2024_cleaned.iterrows():
    print(f"{i} {row['text']}")

0 01 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.
1 02 January - 2023–2024 Indian truckers' protests: Protests are organized by Indian truckers against the severity of the newly proposed law in dealing with the hit-and-run cases.
2 03 January - A court in Jaunpur sentences two men to death over the 2005 Jaunpur train bombing which killed 14 people.
3 03 January - A bus carrying 45 passengers collides with a truck in Golaghat district in Assam killing 12 and injuring 30 others.
4 05 January - 2024 Sandeshkhali violence: A team of ED officers are injured in clashes with local supporters of Shahjahan Sheikh in Sandeshkhali, West Bengal.
5 06 January - ISRO's Aditya-L1 spacecraft on India's first solar mission, successfully enters its final orbit around the first Sun-Earth Lagrangian point (L1), approximately 1.5 million kilometers from the earth.
6 06 January - Three Maldives government minis

In [14]:
df_India_2024_cleaned['text'][0]

'01 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.'

## Step 2 - Creating Embeddings Index for our Chatbot

In [1]:
df_India_2024 = pd.read_csv("India_2024_text.csv")
df_India_2024

<IPython.core.display.Javascript object>

Unnamed: 0,text
0,01 January - ISRO successfully launches its fi...
1,02 January - 2023–2024 Indian truckers' protes...
2,03 January - A court in Jaunpur sentences two ...
3,03 January - A bus carrying 45 passengers coll...
4,05 January - 2024 Sandeshkhali violence: A tea...
...,...
184,18 December - A bill calling for the synchroni...
185,18 December - 2024 Mumbai boat accident: An In...
186,19 December - Five suspected militants are kil...
187,23 December - The Delhi High Court denies anti...


In [16]:
!pip install --upgrade openai httpx

Collecting openai
  Using cached openai-1.86.0-py3-none-any.whl.metadata (25 kB)
Using cached openai-1.86.0-py3-none-any.whl (730 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 0.28.0
    Uninstalling openai-0.28.0:
      Successfully uninstalled openai-0.28.0
Successfully installed openai-1.86.0




In [2]:
import openai
print("openai imported successfully! Version:", openai.VERSION)

openai imported successfully! Version: 1.86.0


In [3]:
#Using openai embedding model to create embeddings

from openai import OpenAI
client = OpenAI(
    base_url = "https://openai.vocareum.com/v1",
    api_key = "YOUR_API_KEY"
)

In [4]:
embedding_model_name = "text-embedding-3-small" #use latest openai embedding model

In [5]:
response = client.embeddings.create(
    input=df_India_2024['text'].tolist(), # Input should be a list of strings
    model=embedding_model_name
)

In [6]:
type(response)

openai.types.create_embedding_response.CreateEmbeddingResponse

In [15]:
#  Access the 'embedding' attribute of each Embedding object

embeddings = [item.embedding for item in response.data]

In [17]:
print(f"Successfully extracted {len(embeddings)} embeddings.")

Successfully extracted 189 embeddings.


Check OK ! Matches the records

In [19]:
print(f"Type of an item in 'response.data': {type(response.data[0])}")

Type of an item in 'response.data': <class 'openai.types.embedding.Embedding'>


Check OK ! Type of item in 'response.data' is 'embeddings.Embedding'

In [24]:
response.data[0].embedding

[-0.01012003980576992,
 0.007087681442499161,
 -0.008387263864278793,
 -0.0007515656179748476,
 -0.009572023525834084,
 0.024321498349308968,
 0.012369518168270588,
 -0.009812107309699059,
 0.09369518607854843,
 -0.0636325553059578,
 0.006539664696902037,
 0.004634654615074396,
 -0.0221503097563982,
 -0.031148219481110573,
 0.014624214731156826,
 0.0066962409764528275,
 -0.05089769512414932,
 0.027473898604512215,
 -0.0732358992099762,
 0.035949889570474625,
 0.0020420143846422434,
 -0.01742170937359333,
 -0.06542796641588211,
 -0.03421711176633835,
 0.01888308674097061,
 -0.0013700415147468448,
 -0.0013948327396064997,
 0.006732775364071131,
 0.028977030888199806,
 -0.021753650158643723,
 0.009770353324711323,
 -0.01643005944788456,
 0.0042145089246332645,
 -0.007766178343445063,
 0.00042927966569550335,
 0.003092379542067647,
 -0.007197285071015358,
 0.032129429280757904,
 -0.022233817726373672,
 -0.014405008405447006,
 -0.002985385712236166,
 -0.016764089465141296,
 0.01403966359794

In [25]:
len(response.data[0].embedding)

1536

Check OK ! Length of the first embedding vector: 1536 as per our expectation.

In [26]:
embeddings

[[-0.01012003980576992,
  0.007087681442499161,
  -0.008387263864278793,
  -0.0007515656179748476,
  -0.009572023525834084,
  0.024321498349308968,
  0.012369518168270588,
  -0.009812107309699059,
  0.09369518607854843,
  -0.0636325553059578,
  0.006539664696902037,
  0.004634654615074396,
  -0.0221503097563982,
  -0.031148219481110573,
  0.014624214731156826,
  0.0066962409764528275,
  -0.05089769512414932,
  0.027473898604512215,
  -0.0732358992099762,
  0.035949889570474625,
  0.0020420143846422434,
  -0.01742170937359333,
  -0.06542796641588211,
  -0.03421711176633835,
  0.01888308674097061,
  -0.0013700415147468448,
  -0.0013948327396064997,
  0.006732775364071131,
  0.028977030888199806,
  -0.021753650158643723,
  0.009770353324711323,
  -0.01643005944788456,
  0.0042145089246332645,
  -0.007766178343445063,
  0.00042927966569550335,
  0.003092379542067647,
  -0.007197285071015358,
  0.032129429280757904,
  -0.022233817726373672,
  -0.014405008405447006,
  -0.002985385712236166,


In [27]:
df_India_2024["embeddings"] = embeddings
df_India_2024

Unnamed: 0,text,embeddings
0,01 January - ISRO successfully launches its fi...,"[-0.01012003980576992, 0.007087681442499161, -..."
1,02 January - 2023–2024 Indian truckers' protes...,"[0.01802976243197918, -0.015374388545751572, 0..."
2,03 January - A court in Jaunpur sentences two ...,"[0.0114059429615736, 0.022666586562991142, 0.0..."
3,03 January - A bus carrying 45 passengers coll...,"[0.01869521662592888, 0.020528465509414673, 0...."
4,05 January - 2024 Sandeshkhali violence: A tea...,"[0.049317616969347, -0.021732015535235405, 0.0..."
...,...,...
184,18 December - A bill calling for the synchroni...,"[0.08060289174318314, 0.043887682259082794, 0...."
185,18 December - 2024 Mumbai boat accident: An In...,"[0.020252235233783722, 0.035780277103185654, 0..."
186,19 December - Five suspected militants are kil...,"[0.019430195912718773, 0.016485922038555145, 0..."
187,23 December - The Delhi High Court denies anti...,"[0.03180204704403877, 0.03313806653022766, 0.0..."


In [28]:
# savings the embeddings to csv

df_India_2024.to_csv('India_2024_embeddings.csv')

## Step 3 - Performing Semantic Text Search and Cosine Distance (1 - Cosine Similarity)

In [1]:
df_India_2024 = pd.read_csv("India_2024_embeddings.csv", index_col=0)
df_India_2024

<IPython.core.display.Javascript object>

Unnamed: 0,text,embeddings
0,01 January - ISRO successfully launches its fi...,"[-0.01012003980576992, 0.007087681442499161, -..."
1,02 January - 2023–2024 Indian truckers' protes...,"[0.01802976243197918, -0.015374388545751572, 0..."
2,03 January - A court in Jaunpur sentences two ...,"[0.0114059429615736, 0.022666586562991142, 0.0..."
3,03 January - A bus carrying 45 passengers coll...,"[0.01869521662592888, 0.020528465509414673, 0...."
4,05 January - 2024 Sandeshkhali violence: A tea...,"[0.049317616969347, -0.021732015535235405, 0.0..."
...,...,...
184,18 December - A bill calling for the synchroni...,"[0.08060289174318314, 0.043887682259082794, 0...."
185,18 December - 2024 Mumbai boat accident: An In...,"[0.020252235233783722, 0.035780277103185654, 0..."
186,19 December - Five suspected militants are kil...,"[0.019430195912718773, 0.016485922038555145, 0..."
187,23 December - The Delhi High Court denies anti...,"[0.03180204704403877, 0.03313806653022766, 0.0..."


In [2]:
df_India_2024['embeddings'].apply(eval).apply(np.array)

<IPython.core.display.Javascript object>

0      [-0.01012003980576992, 0.007087681442499161, -...
1      [0.01802976243197918, -0.015374388545751572, 0...
2      [0.0114059429615736, 0.022666586562991142, 0.0...
3      [0.01869521662592888, 0.020528465509414673, 0....
4      [0.049317616969347, -0.021732015535235405, 0.0...
                             ...                        
184    [0.08060289174318314, 0.043887682259082794, 0....
185    [0.020252235233783722, 0.035780277103185654, 0...
186    [0.019430195912718773, 0.016485922038555145, 0...
187    [0.03180204704403877, 0.03313806653022766, 0.0...
188    [0.05866081640124321, 0.007517107762396336, 0....
Name: embeddings, Length: 189, dtype: object

In [3]:
df_India_2024['embeddings'] = df_India_2024['embeddings'].apply(eval).apply(np.array)

<IPython.core.display.Javascript object>

In [4]:
df_India_2024

Unnamed: 0,text,embeddings
0,01 January - ISRO successfully launches its fi...,"[-0.01012003980576992, 0.007087681442499161, -..."
1,02 January - 2023–2024 Indian truckers' protes...,"[0.01802976243197918, -0.015374388545751572, 0..."
2,03 January - A court in Jaunpur sentences two ...,"[0.0114059429615736, 0.022666586562991142, 0.0..."
3,03 January - A bus carrying 45 passengers coll...,"[0.01869521662592888, 0.020528465509414673, 0...."
4,05 January - 2024 Sandeshkhali violence: A tea...,"[0.049317616969347, -0.021732015535235405, 0.0..."
...,...,...
184,18 December - A bill calling for the synchroni...,"[0.08060289174318314, 0.043887682259082794, 0...."
185,18 December - 2024 Mumbai boat accident: An In...,"[0.020252235233783722, 0.035780277103185654, 0..."
186,19 December - Five suspected militants are kil...,"[0.019430195912718773, 0.016485922038555145, 0..."
187,23 December - The Delhi High Court denies anti...,"[0.03180204704403877, 0.03313806653022766, 0.0..."


In [5]:
question = "When did ISRO launched satellite?" 

In [6]:
from openai import OpenAI, Embedding

client = OpenAI(
    base_url = "https://openai.vocareum.com/v1",
    api_key = "YOUR_API_KEY"
)

In [7]:
embedding_model_name = "text-embedding-3-small" #use latest openai embedding model

In [8]:
question_embedding = client.embeddings.create(input=[question], model=embedding_model_name)
question_embedding

CreateEmbeddingResponse(data=[Embedding(embedding=[-0.0005648165242746472, 0.002430807566270232, 0.013194723054766655, -0.05155904218554497, 0.007441092282533646, 0.0035426511894911528, 0.03415583446621895, 0.0018831451889127493, 0.047533534467220306, -0.007481753826141357, -0.005321600940078497, 0.0033215531148016453, -0.033972855657339096, -0.07274378836154938, -0.002477822592481971, -0.02451900765299797, -0.04112931713461876, 0.012046030722558498, -0.08538958430290222, 0.018033467233181, 0.05302286520600319, 0.008808341808617115, -0.029052788391709328, -0.02767029032111168, 0.03907589986920357, 0.006922655273228884, 0.013316708616912365, -0.015827568247914314, -0.003946726676076651, -0.019222822040319443, 0.024580001831054688, -0.01587839610874653, -0.01279827207326889, -0.018755212426185608, -0.019548114389181137, 0.050298530608415604, 0.02368544414639473, -0.010063772089779377, -0.02167268842458725, -0.0014117235550656915, -0.021733682602643967, -0.019426129758358, 0.0440366268157

In [9]:
type(question_embedding.data[0].embedding)

list

In [10]:
len(question_embedding.data[0].embedding)

1536

In [67]:
from scipy.spatial import distance

In [11]:
df_India_2024['embeddings']

0      [-0.01012003980576992, 0.007087681442499161, -...
1      [0.01802976243197918, -0.015374388545751572, 0...
2      [0.0114059429615736, 0.022666586562991142, 0.0...
3      [0.01869521662592888, 0.020528465509414673, 0....
4      [0.049317616969347, -0.021732015535235405, 0.0...
                             ...                        
184    [0.08060289174318314, 0.043887682259082794, 0....
185    [0.020252235233783722, 0.035780277103185654, 0...
186    [0.019430195912718773, 0.016485922038555145, 0...
187    [0.03180204704403877, 0.03313806653022766, 0.0...
188    [0.05866081640124321, 0.007517107762396336, 0....
Name: embeddings, Length: 189, dtype: object

In [15]:
from scipy.spatial import distance
import numpy as np

# --- 1. Prepare the question embedding as a NumPy array ---
# Based on your output, question_embedding.data[0].embedding is a list.
# Convert it to a NumPy array once, outside the loop/apply, for efficiency.
question_embedding_np = np.array(question_embedding.data[0].embedding)

print(f"Shape of question_embedding_np: {question_embedding_np.shape}")
print(f"Type of question_embedding_np: {type(question_embedding_np)}")

# --- 2. Calculate cosine distances for each document embedding using .apply() ---
# The lambda function will be called for each individual 'doc_embed' (which is a list of 1536 floats)
# from the 'embeddings' column.
df_India_2024['distances'] = df_India_2024['embeddings'].apply(
    lambda doc_embed: distance.cosine(question_embedding_np, np.array(doc_embed))
)


print("\nTop 5 most similar entries:")
for index, row in df_India_2024.head(5).iterrows():
    print(f"Text: {row['text']}")
    print(f"Cosine Distances: {row['distances']:.4f}\n")

Shape of question_embedding_np: (1536,)
Type of question_embedding_np: <class 'numpy.ndarray'>

Top 5 most similar entries:
Text: 01 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.
Cosine Distances: 0.3937

Text: 02 January - 2023–2024 Indian truckers' protests: Protests are organized by Indian truckers against the severity of the newly proposed law in dealing with the hit-and-run cases.
Cosine Distances: 0.7940

Text: 03 January - A court in Jaunpur sentences two men to death over the 2005 Jaunpur train bombing which killed 14 people.
Cosine Distances: 0.8602

Text: 03 January - A bus carrying 45 passengers collides with a truck in Golaghat district in Assam killing 12 and injuring 30 others.
Cosine Distances: 0.8429

Text: 05 January - 2024 Sandeshkhali violence: A team of ED officers are injured in clashes with local supporters of Shahjahan Sheikh in Sandeshkhali, West Bengal.
Cosine Dista

In [13]:
df_India_2024

Unnamed: 0,text,embeddings,distances
0,01 January - ISRO successfully launches its fi...,"[-0.01012003980576992, 0.007087681442499161, -...",0.393724
1,02 January - 2023–2024 Indian truckers' protes...,"[0.01802976243197918, -0.015374388545751572, 0...",0.793996
2,03 January - A court in Jaunpur sentences two ...,"[0.0114059429615736, 0.022666586562991142, 0.0...",0.860219
3,03 January - A bus carrying 45 passengers coll...,"[0.01869521662592888, 0.020528465509414673, 0....",0.842883
4,05 January - 2024 Sandeshkhali violence: A tea...,"[0.049317616969347, -0.021732015535235405, 0.0...",0.883672
...,...,...,...
184,18 December - A bill calling for the synchroni...,"[0.08060289174318314, 0.043887682259082794, 0....",0.813319
185,18 December - 2024 Mumbai boat accident: An In...,"[0.020252235233783722, 0.035780277103185654, 0...",0.770411
186,19 December - Five suspected militants are kil...,"[0.019430195912718773, 0.016485922038555145, 0...",0.871175
187,23 December - The Delhi High Court denies anti...,"[0.03180204704403877, 0.03313806653022766, 0.0...",0.803764


In [14]:
df_India_2024.to_csv('India_2024_distances.csv')

## Step 4 - Finding relevant data with sorting the distances

In [24]:
df_India_2024.sort_values(by = 'distances')

Unnamed: 0,text,embeddings,distances
27,17 February - ISRO launches the INSAT-3DS mete...,[ 0.00683427 0.02223246 0.0419548 ... 0.03...,0.306182
0,01 January - ISRO successfully launches its fi...,[-0.01012004 0.00708768 -0.00838726 ... -0.00...,0.393724
5,06 January - ISRO's Aditya-L1 spacecraft on In...,[ 0.01895185 -0.02676057 0.06545259 ... 0.03...,0.482712
106,12 July - •14 July - ISRO successfully launche...,[0.02670608 0.00750543 0.04870995 ... 0.010489...,0.518643
170,16 November - India successfully launches its ...,[ 0.00745153 0.03450911 0.06229964 ... 0.00...,0.589494
...,...,...,...
71,28 May - Seventeen people are killed and 12 ot...,[ 0.0374497 0.09127934 0.00351448 ... 0.00...,0.907009
160,31 October - At least ten elephants die from s...,[0.03771039 0.0207937 0.0539408 ... 0.048185...,0.910944
126,23 August - At least 19 people are reported ki...,[ 0.06143994 0.06807875 0.0277923 ... 0.00...,0.911160
168,13 November - The Supreme Court outlaws the go...,[ 0.05627719 0.02072385 0.0492133 ... 0.00...,0.921487


In [26]:
df_India_2024.sort_values(by = 'distances').to_csv("India_2024_distances_sorted.csv")

## Step 5 - Compose a Custom Text Prompt (MAGIC STEP)

In [1]:
embedding_model = 'text-embedding-3-small'

# Token limit is 8191 per item.

In [2]:
#Creating a custom prompt template

prompt_template = """
Answer the question based on the context below, and if the question can't be answered on the context,
say 'I don't know!!'

Context:

{}

---

Question:{}
Answer:
"""

In [3]:
context = "Here is the context for question to be answered."

In [4]:
question = "When did ISRO launched satellite?" 

In [5]:
#creating answer as an output

print(prompt_template.format(context,question))


Answer the question based on the context below, and if the question can't be answered on the context,
say 'I don't know!!'

Context:

Here is the context for question to be answered.

---

Question:When did ISRO launched satellite?
Answer:



In [6]:
import tiktoken

tokenizer = tiktoken.get_encoding("cl100k_base")
tokenizer

<Encoding 'cl100k_base'>

In [7]:
tokenizer.encode(question)

[4599, 1550, 3507, 1308, 11887, 24088, 30]

In [8]:
print(tokenizer.encode(prompt_template))

[198, 16533, 279, 3488, 3196, 389, 279, 2317, 3770, 11, 323, 422, 279, 3488, 649, 956, 387, 19089, 389, 279, 2317, 345, 37890, 364, 40, 1541, 956, 1440, 3001, 3961, 2014, 1473, 32583, 45464, 14924, 12832, 534, 16533, 512]


In [9]:
# Check for existing token count with question and prompt

current_token_count = len(tokenizer.encode(question)) + len(tokenizer.encode(prompt_template))
current_token_count

46

In [10]:
#Limiting max tokens below the model limit for scalability

max_tokens = 1000

In [11]:
# reading the dataframe

df_India_2024 = pd.read_csv("India_2024_distances_sorted.csv", index_col = 0)
df_India_2024

<IPython.core.display.Javascript object>

Unnamed: 0,text,embeddings,distances
27,17 February - ISRO launches the INSAT-3DS mete...,[ 0.00683427 0.02223246 0.0419548 ... 0.03...,0.306182
0,01 January - ISRO successfully launches its fi...,[-0.01012004 0.00708768 -0.00838726 ... -0.00...,0.393724
5,06 January - ISRO's Aditya-L1 spacecraft on In...,[ 0.01895185 -0.02676057 0.06545259 ... 0.03...,0.482712
106,12 July - •14 July - ISRO successfully launche...,[0.02670608 0.00750543 0.04870995 ... 0.010489...,0.518643
170,16 November - India successfully launches its ...,[ 0.00745153 0.03450911 0.06229964 ... 0.00...,0.589494
...,...,...,...
71,28 May - Seventeen people are killed and 12 ot...,[ 0.0374497 0.09127934 0.00351448 ... 0.00...,0.907009
160,31 October - At least ten elephants die from s...,[0.03771039 0.0207937 0.0539408 ... 0.048185...,0.910944
126,23 August - At least 19 people are reported ki...,[ 0.06143994 0.06807875 0.0277923 ... 0.00...,0.911160
168,13 November - The Supreme Court outlaws the go...,[ 0.05627719 0.02072385 0.0492133 ... 0.00...,0.921487


In [12]:
#population context with meaningful information with check for the context limit

context = []

for text in df_India_2024['text'].values:
    text_token_count = len(tokenizer.encode(text))
    current_token_count += text_token_count

    if current_token_count <= max_tokens:
        context.append(text)
    else:
        break

In [13]:
context

['17 February - ISRO launches the INSAT-3DS meteorological satellite.',
 '01 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.',
 "06 January - ISRO's Aditya-L1 spacecraft on India's first solar mission, successfully enters its final orbit around the first Sun-Earth Lagrangian point (L1), approximately 1.5 million kilometers from the earth.",
 '12 July - •14 July - ISRO successfully launched chandrayan III dated on 14 July 2024* 15 July – Four soldiers including an officer are killed in an ambush by militants in Doda district, Jammu and Kashmir.',
 '16 November - India successfully launches its first hypersonic missile at a test site in Abdul Kalam Island.',
 '28 October - The first private military aircraft manufacturing facility in India is inaugurated in Vadodara as part of a joint venture between Airbus and Tata Advanced Systems.',
 '12 February - Seven of the eight Indian Navy veterans who

In [14]:
print(prompt_template.format("\n\n".join(context), question))


Answer the question based on the context below, and if the question can't be answered on the context,
say 'I don't know!!'

Context:

17 February - ISRO launches the INSAT-3DS meteorological satellite.

01 January - ISRO successfully launches its first X-Ray polarimeter satellite XPoSat to study the polarization of intense X-Ray sources in space.

06 January - ISRO's Aditya-L1 spacecraft on India's first solar mission, successfully enters its final orbit around the first Sun-Earth Lagrangian point (L1), approximately 1.5 million kilometers from the earth.

12 July - •14 July - ISRO successfully launched chandrayan III dated on 14 July 2024* 15 July – Four soldiers including an officer are killed in an ambush by militants in Doda district, Jammu and Kashmir.

16 November - India successfully launches its first hypersonic missile at a test site in Abdul Kalam Island.

28 October - The first private military aircraft manufacturing facility in India is inaugurated in Vadodara as part of a

## Step 6 - Querying a Completion Model

In [15]:
from openai import OpenAI

client = OpenAI(
    base_url = "https://openai.vocareum.com/v1",
    api_key = "YOUR_API_KEY"
)

### Question 1

In [16]:
question = "When did ISRO launched satellite?" 

In [17]:
#Custom Prompt 1

client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages=[
            {"role": "user", "content": prompt_template.format("\n\n".join(context), question)}
        ]
).choices[0].message.content

'ISRO launched the INSAT-3DS meteorological satellite on 17 February.'

In [20]:
#Basic Prompt 1

client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": question}
            ],
            max_tokens=max_tokens,
            temperature=0.7 # A bit higher temperature for general knowledge
        ).choices[0].message.content

'ISRO has launched several satellites over the years. The first satellite, Aryabhata, was launched on April 19, 1975. Since then, ISRO has launched numerous satellites for various purposes including communication, remote sensing, and scientific research.'

### Question 2

In [21]:
question = "What military activities happened in year 2024?"

In [22]:
#Custom Prompt 2

response = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages=[
            {"role": "user", "content": prompt_template.format("\n\n".join(context), question)}
        ]
).choices[0].message.content

print(response)

- The first private military aircraft manufacturing facility in India was inaugurated in Vadodara.
- An Indian Navy submarine collided with a fishing vessel off the coast of Goa.
- 2024 Reasi attack: Nine people were killed and 33 others were injured after a bus carrying Hindu pilgrims was attacked by Lashkar-e-Taiba militants near Reasi, Jammu and Kashmir.
- The Ministry of External Affairs mentioned that about 45 Indian nationals were discharged from the Russian army.
- Air India Express canceled more than 85 flights due to staff calling in sick, linked to a protest against working conditions.
- The UPSC dismissed and canceled the candidature of trainee IAS officer Puja Khedkar.
- India announced an agreement with China regarding military patrols along the Line of Actual Control.


In [24]:
#Basic Prompt 2

response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": question}
            ],
            max_tokens=max_tokens,
            temperature=0.7 # A bit higher temperature for general knowledge
        ).choices[0].message.content

print(response)

There is no specific information available about military activities that occurred in the year 2024 as it is a future date. Military activities are often classified and not publicly disclosed until after they have taken place. It is best to refer to official sources or news reports for information on military activities in the year 2024.


In [None]:
def answer_question(
    question, df, max_prompt_tokens=850, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model

    If the model produces an error, return an empty string
    """

    prompt = create_prompt(question, df, max_prompt_tokens)

    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""