<a href="https://colab.research.google.com/github/mr-cri-spy/Movie_seat_enq_bot/blob/main/Movie_bot_test_updated_theatre_and_return_structured_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project Structure

1. User Query Input → Natural Language
2. LLM Layer → Understand movie name, city, date
3. Web Scraper Layer → Hit BookMyShow or PaytmMovies URLs
4. Parse Theatre & Seat Info
5. Display Output


Query Understanding + Web Search

User Input
LLM based Movie/City extraction (using transformers)
Search URLs dynamically (simulate booking.com / bookmyshow search)

In [21]:

!pip install transformers beautifulsoup4 requests

# --- Imports ---
import re
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# --- Step 1: LLM-based Entity Extraction ---
nlp = pipeline("ner", grouped_entities=True)

def extract_entities(user_input):
    entities = nlp(user_input)
    movie = city = date = None
    for ent in entities:
        if ent['entity_group'] == 'ORG' or ent['entity_group'] == 'MISC':
            movie = ent['word']
        elif ent['entity_group'] == 'LOC':
            city = ent['word']
        elif ent['entity_group'] == 'DATE':
            date = ent['word']
    return movie, city, date

# --- Step 2: Simulate Web Search URL (e.g., BookMyShow or PaytmMovies) ---
def construct_search_url(movie, city):
    city = city.lower().replace(" ", "")
    movie = movie.lower().replace(" ", "-")
    return f"https://in.bookmyshow.com/explore/movies-{city}/{movie}"

# --- Step 3: User Query Input ---
user_input = input("Ask me about a movie seat availability: ")
movie, city, date = extract_entities(user_input)

print(" Movie:", movie)
print(" City:", city)
print(" Date:", date)

search_url = construct_search_url(movie or "Su from so", city or "mysore")
print(" Searching:", search_url)


# --- Step 4: Scrape Movie Page (Theatre Names, Timings) ---
def scrape_theatre_list(url):
    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        print(" Failed to load the page. Maybe wrong city/movie name.")
        return []

    soup = BeautifulSoup(response.text, "html.parser")

    # Try to find theatre info – this is simplified
    theatres = soup.find_all("div", class_="__venue-name")  # might need tuning
    theatre_list = []
    for theatre in theatres:
        name = theatre.get_text(strip=True)
        theatre_list.append(name)

    if not theatre_list:
        print(" Couldn’t extract theatres. The site might be JS-rendered.")
    return theatre_list

# --- Step 5: Run Theatre Extraction ---
print("\n Fetching available theatres...\n")
theatres = scrape_theatre_list(search_url)

if theatres:
    for idx, t in enumerate(theatres, 1):
        print(f"{idx}. {t}")
else:
    print(" No theatres found. Let's simulate with static HTML next.")





No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


Ask me about a movie seat availability: kingdome movie availble in mysore
 Movie: None
 City: None
 Date: None
 Searching: https://in.bookmyshow.com/explore/movies-mysore/su-from-so

 Fetching available theatres...

 Failed to load the page. Maybe wrong city/movie name.
 No theatres found. Let's simulate with static HTML next.


Install required libraries (run this only once)

In [22]:
# --- Step 4: Scrape Movie Page (Theatre Names, Timings) ---
def scrape_theatre_list(url):
    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        print(" Failed to load the page. Maybe wrong city/movie name.")
        return []

    soup = BeautifulSoup(response.text, "html.parser")

    # Try to find theatre info – this is simplified
    theatres = soup.find_all("div", class_="__venue-name")  # might need tuning
    theatre_list = []
    for theatre in theatres:
        name = theatre.get_text(strip=True)
        theatre_list.append(name)

    if not theatre_list:
        print(" Couldn’t extract theatres. The site might be JS-rendered.")
    return theatre_list

# --- Step 5: Run Theatre Extraction ---
print("\n Fetching available theatres...\n")
theatres = scrape_theatre_list(search_url)

if theatres:
    for idx, t in enumerate(theatres, 1):
        print(f"{idx}. {t}")
else:
    print(" No theatres found. Let's simulate with static HTML next.")


    # --- Step 6: Simulated HTML (fake seat map) ---
simulated_html = """
<div class="theatre">
    <h3>Kingdome Theatre, Mysore</h3>
    <div class="seats">
        <span class="seat available">A1</span>
        <span class="seat booked">A2</span>
        <span class="seat available">A3</span>
        <span class="seat booked">A4</span>
        <span class="seat available">B1</span>
        <span class="seat booked">B2</span>
        <span class="seat available">B3</span>
        <span class="seat available">B4</span>
        <!-- Simulate more if needed -->
    </div>
</div>
"""

# --- Step 7: Parse Simulated Seats ---
def parse_seat_availability(html):
    soup = BeautifulSoup(html, "html.parser")
    theatre_name = soup.find("h3").text.strip()
    seats = soup.find_all("span", class_="seat")

    total = len(seats)
    booked = len([s for s in seats if 'booked' in s['class']])
    available = total - booked

    return theatre_name, total, booked, available

# --- Step 8: Display Results ---
theatre, total, booked, available = parse_seat_availability(simulated_html)

print(f"\n {theatre}")
print(f" Total Seats: {total}")
print(f" Booked Seats: {booked}")
print(f" Available Seats: {available}")







 Fetching available theatres...

 Failed to load the page. Maybe wrong city/movie name.
 No theatres found. Let's simulate with static HTML next.

 Kingdome Theatre, Mysore
 Total Seats: 8
 Booked Seats: 3
 Available Seats: 5


In [23]:
#  Setup
!pip install transformers beautifulsoup4 requests

from transformers import pipeline
from bs4 import BeautifulSoup
import re

#  Load LLM-based NER
ner_pipeline = pipeline("ner", grouped_entities=True)

#  Step 1: Extract Movie & City
def extract_entities(text):
    ents = ner_pipeline(text)
    movie = city = None
    for ent in ents:
        label = ent['entity_group']
        word = ent['word']
        if label in ['ORG', 'MISC']:
            movie = movie or word
        elif label == 'LOC':
            city = city or word
    return movie, city

# Step 2: Simulate Theatre Seat HTML
def get_simulated_theatre_html(movie_name, city_name):
    return f"""
    <div class="theatre">
        <h3>{movie_name or "Some Movie"} - Kingdome Theatre, {city_name or "Mysore"}</h3>
        <div class="seats">
            <span class="seat available">A1</span>
            <span class="seat booked">A2</span>
            <span class="seat available">A3</span>
            <span class="seat booked">A4</span>
            <span class="seat available">B1</span>
            <span class="seat booked">B2</span>
            <span class="seat available">B3</span>
            <span class="seat available">B4</span>
        </div>
    </div>
    """

#  Step 3: Parse Seat Info
def parse_seat_availability(html):
    soup = BeautifulSoup(html, "html.parser")
    theatre_name = soup.find("h3").text.strip()
    seats = soup.find_all("span", class_="seat")

    total = len(seats)
    booked = len([s for s in seats if 'booked' in s['class']])
    available = total - booked

    return theatre_name, total, booked, available

#  Step 4: Full Chatbot Flow
def movie_bot_response(user_input):
    print("\n User Asked:", user_input)

    movie, city = extract_entities(user_input)
    print(" Movie:", movie)
    print(" City:", city)

    html = get_simulated_theatre_html(movie, city)
    theatre, total, booked, available = parse_seat_availability(html)

    print("\n", theatre)
    print(" Total Seats:", total)
    print(" Booked Seats:", booked)
    print(" Available Seats:", available)

#  TEST HERE
movie_bot_response("In Mysore are available Kingdome movie seats?")




No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu



 User Asked: In Mysore are available Kingdome movie seats?
 Movie: Kingdome
 City: Mysore

 Kingdome - Kingdome Theatre, Mysore
 Total Seats: 8
 Booked Seats: 3
 Available Seats: 5


In [24]:
!pip install spacy
!python -m spacy download en_core_web_sm

import spacy
from bs4 import BeautifulSoup
import random

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

#  Extract movie and city
def extract_movie_city_spacy(text):
    doc = nlp(text)
    movie = city = None
    for ent in doc.ents:
        if ent.label_ == "GPE":  # city/place
            city = ent.text
        elif ent.label_ in ["ORG", "WORK_OF_ART"]:  # possible movie
            movie = ent.text
    return movie, city


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [17]:
# Generate simulated seat map HTML for multiple theatres
def simulate_multiple_theatres(movie, city):
    theatre_list = ["Kingdome", "Inox Forum", "PVR Mall", "DRC Cinemas"]
    html_blocks = []

    for theatre in theatre_list:
        total_seats = random.randint(80, 200)
        booked = random.randint(10, total_seats - 10)
        available = total_seats - booked

        seat_html = f"""
        <div class="theatre">
            <h3>{movie or 'Some Movie'} - {theatre}, {city or 'City'}</h3>
            <div class="seats" data-total="{total_seats}" data-booked="{booked}" data-available="{available}">
            </div>
        </div>
        """
        html_blocks.append(seat_html)

    return "\n".join(html_blocks)


In [18]:
# Parse each theatre and return structured results
def parse_all_theatres(html):
    soup = BeautifulSoup(html, "html.parser")
    theatres = soup.find_all("div", class_="theatre")
    parsed = []

    for theatre in theatres:
        name = theatre.find("h3").text.strip()
        seats_info = theatre.find("div", class_="seats")
        total = int(seats_info['data-total'])
        booked = int(seats_info['data-booked'])
        available = int(seats_info['data-available'])
        parsed.append((name, total, booked, available))

    return parsed


In [25]:
def movie_bot_v2(user_input):
    print("\n User Asked:", user_input)

    movie, city = extract_movie_city_spacy(user_input)
    print(" Movie:", movie)
    print(" City:", city)

    html = simulate_multiple_theatres(movie, city)
    results = parse_all_theatres(html)

    print("\n Results:")
    for name, total, booked, available in results:
        print(f"\n {name}")
        print(f" Total: {total}")
        print(f" Booked: {booked}")
        print(f" Available: {available}")


In [26]:
movie_bot_v2("Show me Kalki movie seats in Mysore today")



 User Asked: Show me Kalki movie seats in Mysore today
 Movie: None
 City: Mysore

 Results:

 Some Movie - Kingdome, Mysore
 Total: 154
 Booked: 133
 Available: 21

 Some Movie - Inox Forum, Mysore
 Total: 109
 Booked: 20
 Available: 89

 Some Movie - PVR Mall, Mysore
 Total: 171
 Booked: 124
 Available: 47

 Some Movie - DRC Cinemas, Mysore
 Total: 151
 Booked: 48
 Available: 103
