### Multi-turn chat evaluation for every pitch deck

In [36]:
from dotenv import load_dotenv
import os
from google import genai
from google.genai import types
from google.genai import chats
import time
import pandas as pd
load_dotenv()

True

### TIPSC Few-shot examples

In [37]:
TIPSC_FEW_SHOT_EXAMPLES = f"""
Example 1 - 
Pitch Statement:
"An AI-powered tool that detects early signs of diabetic foot ulcers using smartphone images, helping rural healthcare 
workers intervene before complications arise."

TIPSC Review:
Timely: Rising diabetes cases in rural areas make early detection critical. (Score: 5)
Important: Addresses a major healthcare gap affecting millions. (Score: 5)
Profitable: Strong market through health-tech startups and public health programs. (Score: 4)
Solvable: Feasible with current AI imaging and mobile tech. (Score: 5)
Contextual: Team has medical + AI expertise with NGO partnerships. (Score: 5)

Overall Assessment: Excellent (95%)
Brief Justification: The problem is urgent, large-scale, and solvable with current technology. 
The team’s alignment with healthcare stakeholders strengthens contextual fit and market potential.

Example 2 -
Pitch Statement:
"A wearable hydration tracker that reminds users to drink water based on real-time sweat analysis and weather conditions."

TIPSC Review:
Timely: Wellness tech is growing, though hydration-specific solutions are not urgent. (Score: 4)
Important: Moderate market among fitness and sports users. (Score: 4)
Profitable: Viable as a premium product but niche appeal. (Score: 4)
Solvable: Current sensors and IoT make it achievable. (Score: 5)
Contextual: Team has IoT experience but limited market understanding. (Score: 3)

Overall Assessment: Good (80%)
Brief Justification: A relevant and buildable solution with moderate market potential; success depends on positioning 
and user adoption beyond enthusiasts.

Example 3 - 
Pitch Statement:
"A desktop app to remind remote workers to stretch every 30 minutes and suggest exercises."

TIPSC Review:
Timely: The post-pandemic remote work trend is stabilizing. (Score: 3)
Important: Mildly useful but low perceived urgency. (Score: 3)
Profitable: Free alternatives exist; monetization unclear. (Score: 2)
Solvable: Technically simple; easy to build. (Score: 5)
Contextual: Team has coding skills but lacks health/UX expertise. (Score: 3)

Overall Assessment: Fair (60%)
Brief Justification: Simple, achievable idea with limited novelty and unclear market traction; 
lacks compelling urgency or differentiator.

Example 4 - 
Pitch Statement:
"An app that plays motivational quotes every hour to keep users positive."

TIPSC Review:

Timely: No clear trend or urgency for hourly motivational quotes. (Score: 2)
Important: Trivial problem with low impact. (Score: 2)
Profitable: Difficult to monetize; saturated with free apps. (Score: 1)
Solvable: Technically easy but adds little value. (Score: 4)
Contextual: Team lacks psychological or design expertise. (Score: 2)

Overall Assessment: Poor (40%)
Brief Justification: While easily implementable, the idea solves no pressing problem, 
lacks clear market differentiation, and shows weak contextual relevance.

Example 5 -
Pitch Statement:
"A low-cost smart inhaler system that tracks asthma medication usage, predicts attacks using environmental data, 
and alerts caregivers in real time."

TIPSC Review:

Timely: Asthma rates are increasing due to urban pollution; immediate relevance. (Score: 5)
Important: Critical for patients, families, and healthcare providers. (Score: 5)
Profitable: Strong potential for insurance tie-ins and health partnerships. (Score: 4)
Solvable: Current IoT + predictive AI make this feasible. (Score: 5)
Contextual: Team has biomedical and data analytics background. (Score: 5)

Overall Assessment: Excellent (95%)
Brief Justification: Urgent and impactful healthcare problem with a clear path to implementation and adoption; 
strong interdisciplinary team fit enhances feasibility and trust.

Example 6 - 
Pitch Statement:
"An AI chatbot that suggests eco-friendly alternatives when users shop online — like showing sustainable brands or 
second-hand options."

TIPSC Review:

Timely: Sustainability awareness is increasing but not yet mainstream behavior. (Score: 4)
Important: Appeals to a growing but niche eco-conscious segment. (Score: 4)
Profitable: Monetization possible via affiliate or brand partnerships. (Score: 4)
Solvable: Readily achievable using APIs and recommendation engines. (Score: 5)
Contextual: Team has AI experience but limited marketing background. (Score: 3)

Overall Assessment: Good (80%)
Brief Justification: Strong alignment with sustainability trends and implementable tech; 
moderate commercial potential limited by user behavior change barriers.

Example 7 -
Pitch Statement:
"A mobile app that helps people organize their daily to-do lists using colorful emojis and sound alerts to make productivity fun."

TIPSC Review:

Timely: Productivity apps remain evergreen but oversaturated. (Score: 3)
Important: Low differentiation; helps individuals but no major impact. (Score: 3)
Profitable: Hard to stand out in a crowded, free-app market. (Score: 2)
Solvable: Simple app; easily buildable with existing frameworks. (Score: 5)
Contextual: Team has beginner-level coding skills; limited UX experience. (Score: 3)

Overall Assessment: Fair (60%)
Brief Justification: Technically achievable but lacks novelty, urgency, and clear market pull; 
execution quality will determine limited success.

Example 8 -
Pitch Statement:
"An app that changes your phone wallpaper every hour to keep you inspired and motivated throughout the day."

TIPSC Review:

Timely: No identifiable need or trend driving this idea. (Score: 2)
Important: Minimal user impact; cosmetic value only. (Score: 2)
Profitable: No clear revenue stream or differentiator. (Score: 1)
Solvable: Very easy to build with existing APIs. (Score: 4)
Contextual: Team lacks direction and product reasoning. (Score: 2)

Overall Assessment: Poor (40%)
Brief Justification: Technically trivial concept with no significant need, value proposition, 
or sustainable market advantage; fails to meet hackathon impact criteria.
"""

### Problem Evidence and Validation (Weightage : 30%)

In [38]:
def prompt_prob_evidence_val(problem_statement_text) :
    return f"""
        You are an expert evaluator for university hackathon pitch decks. Your task is to assess the Problem Evidence & Validation based on the rubric below.

        RUBRIC:
        - Excellent (90-100%): 10+ interviews with diverse stakeholders; multiple direct quotes; clear quantification of time/money impact
        - Good (70-89%): 5-9 interviews; some relevant quotes; basic quantification
        - Fair (50-69%): 3-4 interviews; limited evidence; vague numbers
        - Poor (0-49%): <3 interviews; no direct evidence; purely anecdotal

        The Problem Evidence and Validation content to evaluate is here:  {problem_statement_text}

        INSTRUCTIONS:
        1. Assign ONE category: Excellent, Good, Fair, or Poor
        2. Provide a 2-3 sentence justification citing specific evidence (or lack thereof) from the pitch deck
        3. Note the approximate number of interviews mentioned (if any)

        OUTPUT FORMAT:
        Category: [Excellent/Good/Fair/Poor]
        Justification: [Your short 2-3 sentence reasoning]
        Interview Count: [Number or "Not specified"]
        """


### Market Opportunity & Viability (Weightage : 20%)

In [39]:
def prompt_market_viability(market_opportunity_viability_text) :
   return f"""
      You are an expert evaluator for university hackathon pitch decks. Your task is to assess Market Opportunity & Viability 
      based on the rubric below.

      RUBRIC:
      - Excellent (90-100%): Clear TAM/SAM/SOM with credible sources; strong profitability argument; competitive gap identified
      - Good (70-89%): Basic market sizing; some business potential; mentions competitors
      - Fair (50-69%): Vague market references; unclear business model
      - Poor (0-49%): No market analysis; no commercial viability

      The Market Opportunity and Viability content to evaluate is here: {market_opportunity_viability_text}

      INSTRUCTIONS:
      1. Assign ONE category: Excellent, Good, Fair, or Poor
      2. Provide a 2-3 sentence justification focusing on:
         - Quality of market sizing (TAM/SAM/SOM presence and credibility)
         - Business model clarity
         - Competitive analysis depth
      3. Note if credible sources are cited for market data

      OUTPUT FORMAT:
      Category: [Excellent/Good/Fair/Poor]
      Justification: [Your short 2-3 sentence reasoning]
      Market Data Quality: [Strong/Moderate/Weak/Absent]
      """

### TIPSC (Weightage : 15%)

In [40]:
def prompt_tipsc(tipsc_text) :
   return f"""
      You are an expert evaluator for university hackathon pitch decks. Your task is to assess Problem Significance using 
      the TIPSC framework. 
      TIPSC means the following:
      T = Timely = Is the problem curent and in need of an urgent solution or recently emergent and a solution can wait?
      I = Important = Does the solution or solving this problem matter to a large or key group of customers or market sectors/segments?
      P = Profitable = Will solving this problem yield Revenue or Value or a potential for these exist (even if limited)?
      S = Solvable = Is it possible to create a solution for this problem now given the technology and other required resources?
      C = Contextual = Is the current situation like team, policiefs, company, approach the right fit?

      Here are a few examples of how to evaluate or assess TIPSC: 
      {TIPSC_FEW_SHOT_EXAMPLES}

      RUBRIC:
      - Excellent (90-100%): Compelling urgency + major impact + clear team advantage + realistic solution path
      - Good (70-89%): Some timeliness + moderate impact + reasonable team fit
      - Fair (50-69%): Vague timing + minor impact + generic team fit
      - Poor (0-49%): No urgency + trivial problem + poor team fit

      The TIPSC Content to be evaluated is: {tipsc_text}

      INSTRUCTIONS:
      1. Assign ONE category: Excellent, Good, Fair, or Poor
      2. Provide a 2-3 sentence justification addressing:
         - Timeliness/urgency of the problem
         - Scale and severity of impact
         - Team's relevant advantage or expertise
         - Realism of proposed solution path
      3. Identify the strongest and weakest TIPSC element

      OUTPUT FORMAT:
      Category: [Excellent/Good/Fair/Poor]
      Justification: [Your reasoning]
      Strongest Element: [T/I/P/S/C]
      Weakest Element: [T/I/P/S/C]
   """

### Solution Direction & Value Proposition (Weightage : 15%)
### FOR RD SIR TO VERIFY

In [41]:
def prompt_solution(solution_value_prop) :
   return f"""
      You are an expert evaluator for university hackathon pitch decks. Your task is to assess Solution Direction & Value Proposition based on the rubric below.

      RUBRIC:
      - Excellent (90-100%): Clear solution hypothesis directly addressing gaps; strong unique value proposition
      - Good (70-89%): Basic solution direction; addresses some gaps
      - Fair (50-69%): Vague solution idea; weak value proposition
      - Poor (0-49%): No clear solution direction; copies existing solutions

      Solution Hypothesis of the Pitch Deck is here: {solution_value_prop}

      INSTRUCTIONS:
      1. Assign ONE category: Excellent, Good, Fair, or Poor
      2. Provide a 2-3 sentence justification focusing on:
         - Strength of value proposition
         - Real-world impact
      3. Note the most significant presentation strength or weakness

      OUTPUT FORMAT:
      Category: [Excellent/Good/Fair/Poor]
      Justification: [Your reasoning]
      Key Strength/Weakness: [Brief description]

      """

### Presentation Comprehension (Weightage : 20%)

In [42]:
def prompt_pres_comp(presentation_cohesion) :
   return f"""
      You are an expert evaluator for university hackathon pitch decks. Your task is to assess Presentation & Cohesion based on the rubric below.

      RUBRIC:
      - Excellent (90-100%): Compelling narrative; logical flow; professional design; clear communication
      - Good (70-89%): Mostly coherent; decent design; some gaps in logic
      - Fair (50-69%): Disjointed arguments; basic design; confusing flow
      - Poor (0-49%): Incoherent story; poor design; unclear messaging

      Summary of the Pitch Deck is here: {presentation_cohesion}


      INSTRUCTIONS:
      1. Assign ONE category: Excellent, Good, Fair, or Poor
      2. Provide a 2-3 sentence justification focusing on:
         - Narrative coherence and logical flow
         - Clarity of communication
         - Overall professional quality
      3. Note the most significant presentation strength or weakness

      OUTPUT FORMAT:
      Category: [Excellent/Good/Fair/Poor]
      Justification: [Your reasoning]
      Key Strength/Weakness: [Brief description]

"""

In [43]:
pitch_decks_df = pd.read_csv("../EvaluateStudentIdeas/pitch_decks_cleaned.csv")

In [44]:
# Rename columns to replaces spaces with underscores
for col in pitch_decks_df :
    pitch_decks_df = pitch_decks_df.rename(columns={col : col.replace(' ', '_')})
pitch_decks_df = pitch_decks_df.rename(columns={'Problem_Statement_(cleaned)' : 'Problem_Statement_Cleaned'})

In [45]:
pitch_decks_df

Unnamed: 0,Team_Name,Problem_Statement,Problem_Evidence,Market_Opportunity_Viability,TIPSC,Competition,Solution_Hypothesis,References,Problem_Statement_Cleaned
0,AquaSmart Innovations,Slide 1: The Problem & The Team - Team Name: A...,Slide 2: Evidence of Customer's Pain Point - K...,Slide 3: Quantifying the Problem - Market Size...,Slide 4: Why This Problem is TIPSC - Timely: C...,Slide 5: Competitive Landscape & The Gap - Cur...,Slide 6: Solution Hypothesis - Proposed Soluti...,Slide 7: Next Steps - Prototype testing in 500...,Core Problem Statement: Urban Water Conservati...
1,Triad_Kernals_Problem_Deck_2025 - Inchara K Ku...,Slide 1: The Problem & The Team Team Name: Tri...,Slide 2: Evidence of Customer’s Pain Point R&D...,Slide 3: Quantifying the Problem Market Size (...,Slide 4: Why This Problem is TIPSC (The Strate...,Slide 5: The Competitive Landscape & The Gap C...,Slide 6: The Solution Hypothesis (High-Level O...,"Slide 7: Appendix, References & Next Steps Our...",Core Problem Statement: Artisan Market Access ...
2,AgriSat Tech,Slide 1: The Problem & The Team - Team Name: A...,Slide 2: Evidence of Customer's Pain Point - K...,Slide 3: Quantifying the Problem - Market Size...,Slide 4: Why This Problem is TIPSC - Timely: I...,Slide 5: Competitive Landscape & The Gap - Cur...,Slide 6: Solution Hypothesis - Proposed Soluti...,Slide 7: Next Steps - Partner with ISRO for sa...,Core Problem Statement: Precision Agriculture ...
3,RuralConnect,Slide 1: The Problem & The Team - Team Name: R...,Slide 2: Evidence of Customer's Pain Point - K...,Slide 3: Quantifying the Problem - Market Size...,Slide 4: Why This Problem is TIPSC - Timely: D...,Slide 5: Competitive Landscape & The Gap - Cur...,Slide 6: Solution Hypothesis - Proposed Soluti...,Slide 7: Next Steps - Pilot in 50 villages acr...,Core Problem Statement: - Last Mile Rural Conn...


In [46]:
# client = genai.Client(api_key = os.getenv('GOOGLE_API_KEY'))

### LLM-evaluation of each idea while maintaining context

In [47]:
grade_cols = ['Team_Name', 'Problem_Evidence', 'Market_Opp_Viability', 'TIPSC', 'Solution_Dir_Val_Prop', 'Pres_Cohesion', 'Final_Score']
grade_df = pd.DataFrame(columns=grade_cols)

token_cols = ['Team_Name', 'Candidate_Tokens', 'Thought_Tokens', 'Input_Tokens', 'Output_Tokens', 'Total_Tokens']
token_df = pd.DataFrame(columns=token_cols)

justif_cols = ['Team_Name', 'Problem_Evidence', 'Market_Opp_Viability', 'TIPSC', 'Solution_Dir_Val_Prop', 'Pres_Cohesion']
justif_df = pd.DataFrame(columns=justif_cols)


In [48]:
def extract_grade_counts(res, token_counts) :
    '''Extract the word score (Excellent, Good, etc.) and token counts for a given response'''
    word_score = res.text.split("Category:")[1].split("\n")[0].strip()
    
    prompt_total = res.usage_metadata.prompt_token_count
    cand_total = res.usage_metadata.candidates_token_count
    thought_total = res.usage_metadata.thoughts_token_count

    if not cand_total :
        cand_total = 0
    if not thought_total :
        thought_total = 0
    
    token_counts[0] += prompt_total
    token_counts[1] += cand_total
    token_counts[2] += thought_total
    
    return word_score

In [49]:
# # Iterate over every team's submission
# # split, strip is to take content excluding "Slide <num>"
# for teamNum, team in enumerate(pitch_decks_df.itertuples()):
#     grade_df.at[teamNum, 'Team_Name'] = team.Team_Name
#     token_df.at[teamNum, 'Team_Name'] = team.Team_Name
#     prompt_total, cand_total, thought_total = 0, 0, 0
#     token_counts = [prompt_total, cand_total, thought_total]
    
#     # Evalute Metric #1
#     # Set problem_statement_text based on slides 1 & 2
#     ps_raw = "Core Problem Statement: " + team.Problem_Statement.split("Core Problem Statement:", 1)[1].strip() + "\n"
#     ps_evidence = team.Problem_Evidence.split("Slide", 1)[1].split(":", 1)[1].strip()
#     problem_statement_text = ps_raw + ps_evidence

#     PE_res = await chat.send_message(prompt_prob_evidence_val(problem_statement_text))
#     PE_word_score = extract_grade_counts(PE_res, token_counts)
#     grade_df.at[teamNum, 'Problem_Evidence'] = PE_word_score


#     # Evaluate Metric #2
#     market_opportunity_viability_text = team.Market_Opportunity_Viability.split("Slide", 1)[1].split(":", 1)[1].strip()
#     MOV_res = await chat.send_message(prompt_market_viability(market_opportunity_viability_text))
#     MOV_word_score = extract_grade_counts(MOV_res, token_counts)
#     grade_df.at[teamNum, 'Market_Opp_Viability'] = MOV_word_score

#     # Evaluate Metric #3
#     tipsc_text = "Timely: " + team.TIPSC.split("Timely", 1)[1].split(":", 1)[1].strip()
#     TIPSC_res = await chat.send_message(prompt_tipsc(tipsc_text))
#     TIPSC_word_score = extract_grade_counts(TIPSC_res, token_counts)
#     grade_df.at[teamNum, 'TIPSC'] = TIPSC_word_score

#     # Evaluate Metric #4 - REMOVE REDUNDANT WORDS HERE.
#     solution_value_prop = team.Solution_Hypothesis.split("Slide", 1)[1].split(":", 1)[1].strip()
#     sol_res = await chat.send_message(prompt_solution(solution_value_prop))
#     sol_word_score = extract_grade_counts(sol_res, token_counts)
#     grade_df.at[teamNum, 'Solution_Dir_Val_Prop'] = sol_word_score

#     # Evaluate Metric #5
#     presentation_cohesion = team.Problem_Statement_Cleaned
#     cohesion_res = await chat.send_message(prompt_pres_comp(presentation_cohesion))
#     cohesion_word_score = extract_grade_counts(cohesion_res, token_counts)
#     grade_df.at[teamNum, 'Pres_Cohesion'] = cohesion_word_score

#     # Token stats for this pitch
#     token_df.at[teamNum, 'Input_Tokens'] = token_counts[0]
#     token_df.at[teamNum, 'Candidate_Tokens'] = token_counts[1]
#     token_df.at[teamNum, 'Thought_Tokens'] = token_counts[2]
#     token_df.at[teamNum, 'Output_Tokens'] = token_counts[1] + token_counts[2]
#     token_df.at[teamNum, 'Total_Tokens'] = sum(token_counts)


In [50]:
token_df

Unnamed: 0,Team_Name,Candidate_Tokens,Thought_Tokens,Input_Tokens,Output_Tokens,Total_Tokens


### Get final scores for each idea using weightages

In [51]:
# Word Score to Decimal Score mapping
score_map = {'Poor' : 0.25,
             'Fair' : 0.5,
             'Good' : 0.75,
             'Excellent' : 1
            }

In [52]:
for teamNum, team in enumerate(grade_df.itertuples()) :
    grade_df.at[teamNum, 'Final_Score'] = 0.3 * score_map[team.Problem_Evidence] + \
                                        0.2 * score_map[team.Market_Opp_Viability] + \
                                        0.15 * score_map[team.TIPSC] + \
                                        0.15 * score_map[team.Solution_Dir_Val_Prop] + \
                                        0.2 * score_map[team.Pres_Cohesion]
    

In [53]:
grade_df

Unnamed: 0,Team_Name,Problem_Evidence,Market_Opp_Viability,TIPSC,Solution_Dir_Val_Prop,Pres_Cohesion,Final_Score


In [54]:
token_df

Unnamed: 0,Team_Name,Candidate_Tokens,Thought_Tokens,Input_Tokens,Output_Tokens,Total_Tokens


# Revised criteria
1. Integer on scale of 1-10
2. 2-3 line justification

In [55]:
import asyncio

In [56]:
def getPrompt(rubrik, text) :
    return f"""
            Rubrik: {rubrik}
            Text: {text}
    """

In [57]:
def extract_num_score_justif(res, token_counts) :
    num_score = res.text.split("Score:")[1].split("\n")[0].strip()
    justification = res.text.split("Justification:")[1].strip()
    
    prompt_total = res.usage_metadata.prompt_token_count
    cand_total = res.usage_metadata.candidates_token_count
    thought_total = res.usage_metadata.thoughts_token_count

    if not cand_total :
        cand_total = 0
    if not thought_total :
        thought_total = 0
    
    token_counts[0] += prompt_total
    token_counts[1] += cand_total
    token_counts[2] += thought_total
    
    return int(num_score), justification

In [58]:
client = genai.Client(api_key = os.getenv('GOOGLE_API_KEY'))

In [None]:
# Iterate over every team's submission
# split, strip is to take content excluding "Slide <num>"
for teamNum, team in enumerate(pitch_decks_df.itertuples()):
    grade_df.at[teamNum, 'Team_Name'] = team.Team_Name
    token_df.at[teamNum, 'Team_Name'] = team.Team_Name
    justif_df.at[teamNum, 'Team_Name'] = team.Team_Name
    prompt_total, cand_total, thought_total = 0, 0, 0
    token_counts = [prompt_total, cand_total, thought_total]

    chat = client.aio.chats.create(
        model='gemini-2.5-flash-preview-09-2025',
        config = types.GenerateContentConfig(
            system_instruction="""
                You are an expert evaluator for university hackathon pitch decks.
                
                Your task is rate different rubriks with an INTEGER on a scale of 1 to 10,
                1 being the lowest and 10 the highest.

                Use the following to evaluate the TIPSC rubrik :
                T = Timely = Is the problem curent and in need of an urgent solution or recently emergent and a solution can wait?
                I = Important = Does the solution or solving this problem matter to a large or key group of customers or market sectors/segments?
                P = Profitable = Will solving this problem yield Revenue or Value or a potential for these exist (even if limited)?
                S = Solvable = Is it possible to create a solution for this problem now given the technology and other required resources?
                C = Contextual = Is the current situation like team, policiefs, company, approach the right fit?

                Input Format :
                - Rubrik:
                - Text:

                Output Format :
                - Score:
                - Justification: (In 2 lines)
            """
        ))
        
    # Evalute Metric #1
    # Set problem_statement_text based on slides 1 & 2
    ps_raw = "Core Problem Statement: " + team.Problem_Statement.split("Core Problem Statement:", 1)[1].strip() + "\n"
    ps_evidence = team.Problem_Evidence.split("Slide", 1)[1].split(":", 1)[1].strip()
    problem_statement_text = ps_raw + ps_evidence

    PE_res = await chat.send_message(getPrompt(rubrik='Problem Evidence & Validation', text=problem_statement_text))
    time.sleep(2)
    PE_num_score, justif = extract_num_score_justif(PE_res, token_counts)
    grade_df.at[teamNum, 'Problem_Evidence'] = PE_num_score
    justif_df.at[teamNum, 'Problem_Evidence'] = justif


    # Evaluate Metric #2
    market_opportunity_viability_text = team.Market_Opportunity_Viability.split("Slide", 1)[1].split(":", 1)[1].strip()
    MOV_res = await chat.send_message(getPrompt(rubrik='Market Opportunity & Viability', text=market_opportunity_viability_text))
    time.sleep(2)
    MOV_num_score, justif = extract_num_score_justif(MOV_res, token_counts)
    grade_df.at[teamNum, 'Market_Opp_Viability'] = MOV_num_score
    justif_df.at[teamNum, 'Market_Opp_Viability'] = justif

    # Evaluate Metric #3
    tipsc_text = "Timely: " + team.TIPSC.split("Timely", 1)[1].split(":", 1)[1].strip()
    TIPSC_res = await chat.send_message(getPrompt(rubrik='TIPSC', text=tipsc_text))
    time.sleep(2)
    TIPSC_num_score, justif = extract_num_score_justif(TIPSC_res, token_counts)
    grade_df.at[teamNum, 'TIPSC'] = TIPSC_num_score
    justif_df.at[teamNum, 'TIPSC'] = justif

    # Evaluate Metric #4 - REMOVE REDUNDANT WORDS HERE.
    solution_value_prop = team.Solution_Hypothesis.split("Slide", 1)[1].split(":", 1)[1].strip()
    sol_res = await chat.send_message(getPrompt(rubrik='Solution Direction & Value Proposition', text=solution_value_prop))
    time.sleep(2)
    sol_num_score, justif = extract_num_score_justif(sol_res, token_counts)
    grade_df.at[teamNum, 'Solution_Dir_Val_Prop'] = sol_num_score
    justif_df.at[teamNum, 'Solution_Dir_Val_Prop'] = justif

    # Evaluate Metric #5
    presentation_cohesion = team.Problem_Statement_Cleaned
    cohesion_res = await chat.send_message(getPrompt(rubrik='Presentation & Communication', text=presentation_cohesion))
    time.sleep(2)
    cohesion_num_score, justif = extract_num_score_justif(cohesion_res, token_counts)
    grade_df.at[teamNum, 'Pres_Cohesion'] = cohesion_num_score
    justif_df.at[teamNum, 'Pres_Cohesion'] = justif

    # Token stats for this pitch
    token_df.at[teamNum, 'Input_Tokens'] = token_counts[0]
    token_df.at[teamNum, 'Candidate_Tokens'] = token_counts[1]
    token_df.at[teamNum, 'Thought_Tokens'] = token_counts[2]
    token_df.at[teamNum, 'Output_Tokens'] = token_counts[1] + token_counts[2]
    token_df.at[teamNum, 'Total_Tokens'] = sum(token_counts)

    time.sleep(10)


In [62]:
for teamNum, team in enumerate(grade_df.itertuples()) :
    grade_df.at[teamNum, 'Final_Score'] = 0.3 * team.Problem_Evidence + \
                                        0.2 * team.Market_Opp_Viability + \
                                        0.15 * team.TIPSC + \
                                        0.15 * team.Solution_Dir_Val_Prop + \
                                        0.2 * team.Pres_Cohesion

### Average input and output tokens calculation

In [63]:
result_df = pd.DataFrame({
    'Avg_Input' : [token_df['Input_Tokens'].mean()],
    'Avg_Output' : [token_df['Output_Tokens'].mean()],
    'Avg_Total' : [token_df['Total_Tokens'].mean()],
})

In [68]:
print("Total tokens for all teams = ", token_df['Total_Tokens'].sum())

Total tokens for all teams =  23340


In [61]:
token_df

Unnamed: 0,Team_Name,Candidate_Tokens,Thought_Tokens,Input_Tokens,Output_Tokens,Total_Tokens
0,AquaSmart Innovations,283,1382,3267,1665,4932
1,Triad_Kernals_Problem_Deck_2025 - Inchara K Ku...,351,1544,6147,1895,8042
2,AgriSat Tech,310,1961,3299,2271,5570
3,RuralConnect,289,1164,3343,1453,4796


In [66]:
grade_df

Unnamed: 0,Team_Name,Problem_Evidence,Market_Opp_Viability,TIPSC,Solution_Dir_Val_Prop,Pres_Cohesion,Final_Score
0,AquaSmart Innovations,10,8,10,9,10,9.45
1,Triad_Kernals_Problem_Deck_2025 - Inchara K Ku...,10,8,9,9,9,9.1
2,AgriSat Tech,10,9,9,10,10,9.65
3,RuralConnect,9,8,9,7,9,8.5


In [67]:
result_df

Unnamed: 0,Avg_Input,Avg_Output,Avg_Total
0,4014.0,1821.0,5835.0


In [65]:
justif_df

Unnamed: 0,Team_Name,Problem_Evidence,Market_Opp_Viability,TIPSC,Solution_Dir_Val_Prop,Pres_Cohesion
0,,"The evidence is robust, quantifying the proble...","The market sizing (TAM, SAM, SOM) is well-defi...",All five criteria are maximally satisfied: the...,"The solution is comprehensive, combining IoT h...","The communication is exceptionally clear, star..."
1,,The team excelled by quantifying the annual in...,The market size (TAM/SAM/SOM) is clearly quant...,"The pitch robustly validates T, I, and P with ...","The value proposition is clear, directly addre...",The core problem statement is immediately impa...
2,,The problem is robustly quantified with a mass...,The market sizing (TAM/SAM/SOM) is clearly def...,The proposal scores highly due to strong align...,"The solution provides a holistic, multi-layere...","The communication is outstanding, immediately ..."
3,,The problem is quantified effectively using bo...,"The market sizing (TAM, SAM, SOM) is clearly d...","All TIPSC components are highly aligned, parti...","The solution uses relevant, distributed techno...","The communication is exceptionally clear, star..."
