# Advanced Commit Search

#### Env Variables

These configuration variables control the behavior of the vector database operations:

- **VECTOR_DB_INSERT_BATCH_SIZE**: Number of vectors to insert into Qdrant in each batch operation

These constants help optimize performance and manage resource usage throughout the application.

In [1]:
VECTOR_DB_INSERT_BATCH_SIZE = 100

#### Imports and Downloads

This section imports all the necessary libraries and downloads required NLTK data:

**Core Libraries:**
- `re`: Regular expressions for text processing
- `nltk`: Natural Language Toolkit for sentence tokenization
- `xml.etree.ElementTree`: XML parsing capabilities
- `tqdm`: Progress bars for long-running operations
- `os`: File system operations
- `sqlite3`: SQLite database operations
- `subprocess`: Running shell commands

**Machine Learning & Vector Database:**
- `sentence_transformers`: Convert text to semantic embeddings
- `qdrant_client`: Vector database for similarity search
- `qdrant_client.models`: Data structures for vector operations

**NLTK Downloads:**
- `punkt`: Sentence tokenizer models
- `punkt_tab`: Additional tokenization data

These libraries enable the complete pipeline from data extraction to semantic search.

In [2]:
import re
import nltk
import xml.etree.ElementTree as ET
from tqdm import tqdm
import os
from qdrant_client.http.models import Filter, FieldCondition, MatchValue
import sqlite3
import subprocess
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

## 1- Download Commit Messages

This section executes the shell script to download commit messages from Git repositories:

**Process Flow:**
1. **Script Execution**: Runs `export_commit_messages.sh` using subprocess
2. **Real-time Output**: Streams stdout in real-time to show progress
3. **Error Handling**: Captures and displays any errors from stderr
4. **Exit Validation**: Checks return code to ensure successful completion

The script downloads commit data and converts it to XML format for further processing. This step is essential for gathering the raw data that will be processed into embeddings.

In [None]:
process = subprocess.Popen(['bash', 'export_commit_messages.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

for line in process.stdout:
    print(line, end='')

process.wait()
if process.returncode != 0:
    stderr = process.stderr.read()
    print("Error downloading commit messages:", stderr)
    exit(1)
print("Commit messages downloaded successfully.")


## 2- Validate XML Files with XSD

## !!! You will see an error in carbon reposity at commit afa265dd5710a81bda1ad81fc992f5441e9a15da due to invalid characters in the commit message. Remove the invalid characters and re-run the cell.

This step validates the downloaded XML files against a predefined XSD schema:

**Validation Process:**
1. **Schema Validation**: Uses `validate-xml.sh` to check XML structure
2. **Error Detection**: Identifies malformed XML or invalid characters
3. **Data Quality**: Ensures XML files meet expected format requirements
4. **Pre-processing Check**: Validates data before proceeding to parsing

**Common Issues:**
- Invalid characters in commit messages (like control characters)
- Malformed XML structure
- Encoding problems

XML validation is crucial for preventing parsing errors in subsequent steps.

In [4]:
result = subprocess.run(['bash', 'validate-xml.sh'], capture_output=True, text=True)
print(result.stdout)
if result.returncode != 0:
    print("Error validating XML file:", result.stderr)
    exit(1)
print("XML file validated successfully.")

[0;32mXML Validation Script[0m
[1;33mSchema file:[0m repository-commit-schema.xsd
[1;33mXML directory:[0m XML_commit_messages

[1;33mValidating schema file...[0m
[0;32m✓ Schema file is valid[0m

[1;33mValidating XML files...[0m

Validating carbon-lang_commits.xml... [0;32m✓ VALID[0m
Validating pyrefly_commits.xml... [0;32m✓ VALID[0m

[1;33mValidation Summary:[0m
[1;33m  Files found:[0m 2
[0;32m  Valid files:[0m 2
[0;31m  Invalid files:[0m 0

[0;32m✓ All XML files are valid according to the schema[0m

XML file validated successfully.


## 3- Create Qdrant Collection

### XML Parser

In [5]:
def parse_commits(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    repo_url = root.find('url').text if root.find('url') is not None else None
    repo_name = root.find('name').text if root.find('name') is not None else None
    commits = []
    print(repo_url, repo_name)
    for commit in tqdm(root.findall('.//commit'), desc="Parsing commits"):
        message = commit.find('message').text
        author = commit.find('author').text
        date = commit.find('date').text
        hash = commit.find('hash').text
        commits.append({"message": message,
                        "author": author,
                        "date": date,
                        "hash": hash,
                        "repo_url": repo_url,
                        "repo_name": repo_name})
    return commits

Parse XML files containing commit data and extract structured information:

1. **XML parsing**: Use ElementTree to parse the XML structure
2. **Repository metadata**: Extract repository URL and name from the root element
3. **Commit iteration**: Loop through all commit elements with a progress bar
4. **Data extraction**: Extract message, author, date, and hash for each commit
5. **Return structured data**: Create a list of dictionaries with all commit information

The function returns a standardized format that can be easily processed by other parts of the pipeline.

In [6]:
commits = []
xml_dir = 'XML_commit_messages'
for filename in os.listdir(xml_dir):
    if filename.endswith('.xml'):
        file_path = os.path.join(xml_dir, filename)
        commits.extend(parse_commits(file_path))

https://github.com/facebook/pyrefly.git pyrefly


Parsing commits: 100%|██████████| 6531/6531 [00:00<00:00, 1083841.08it/s]



https://github.com/carbon-language/carbon-lang.git carbon-lang


Parsing commits: 100%|██████████| 4182/4182 [00:00<00:00, 735423.22it/s]
Parsing commits: 100%|██████████| 4182/4182 [00:00<00:00, 735423.22it/s]


Process all XML files in the XML_commit_messages directory and combine their commit data into a single list. This allows us to work with commits from multiple repositories in a unified way.

**Multi-Repository Processing:**

This code iterates through all XML files in the `XML_commit_messages` directory and processes them:

1. **Directory Scanning**: Lists all files in the XML directory
2. **File Filtering**: Only processes files with `.xml` extension
3. **Batch Processing**: Calls `parse_commits()` for each XML file
4. **Data Aggregation**: Combines commits from all repositories into a single list

This approach allows the system to handle multiple Git repositories simultaneously, creating a unified dataset for analysis. Each repository's commits are parsed and added to the master `commits` list.

### Generate Embeddings

In [7]:
model_name = 'multi-qa-MiniLM-L6-cos-v1'
print(f"Loading model {model_name} ...")
model = SentenceTransformer(model_name)  # better for query → doc
print("Generating embeddings...")

Loading model multi-qa-MiniLM-L6-cos-v1 ...
Generating embeddings...
Generating embeddings...


Load the SentenceTransformer model for generating embeddings:

- **Model choice**: `multi-qa-MiniLM-L6-cos-v1` is optimized for query-to-document similarity
- **Size**: Produces 384-dimensional vectors
- **Performance**: Good balance between speed and quality for semantic search tasks

This model will convert commit messages into numerical vectors that capture their semantic meaning.

### Set Up Qdrant & Store Embeddings

In [8]:
client = QdrantClient(host='qdrant', port=6333)

if not client.collection_exists(collection_name="commits"):
    client.create_collection(
        collection_name="commits",
        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    )
else:
    client.delete_collection(collection_name="commits")
    client.create_collection(
        collection_name="commits",
        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    )


points = [PointStruct(id=i, vector=model.encode(c['message'], convert_to_numpy=True).tolist(), payload={"hash": c['hash'], "author": c['author'], "date": c['date'], "message": c['message'], "repo_url": c['repo_url'], "repo_name": c['repo_name']})
          for i, c in enumerate(commits)]


for i in tqdm(range(0, len(points), VECTOR_DB_INSERT_BATCH_SIZE), desc="Upserting to Qdrant"):
    batch = points[i:i+VECTOR_DB_INSERT_BATCH_SIZE]
    client.upsert(collection_name="commits", points=batch)

  return forward_call(*args, **kwargs)
Upserting to Qdrant: 100%|██████████| 108/108 [00:03<00:00, 35.08it/s]
Upserting to Qdrant: 100%|██████████| 108/108 [00:03<00:00, 35.08it/s]


Set up the Qdrant vector database and store commit embeddings:

1. **Collection management**: Create or recreate the "commits" collection
2. **Vector configuration**: 384-dimensional vectors with cosine similarity
3. **Embedding generation**: Convert each commit message to a vector
4. **Batch insertion**: Insert vectors in batches for better performance
5. **Metadata storage**: Store commit hash, author, date, and message as payload

This creates a searchable vector database where we can find semantically similar commits.

### Embed User Input

In [9]:
def embed_user_query(query):
    return model.encode(query, convert_to_numpy=True)

Convert user search queries into embeddings using the same model used for commit messages. This ensures that queries and documents exist in the same vector space for accurate similarity comparison.

### Search Qdrant for Similar Commit Messages

In [10]:
def search_similar_commits(query, score_threshold=0.5):
    vector = embed_user_query(query)
    results = client.query_points(
        collection_name="commits",
        query=vector.tolist(),
        with_payload=True,
    )
    if not results.points:
        return "No similar commit found."
    
    # Filter results by score threshold
    filtered_results = [res for res in results.points if res.score > score_threshold]
    if not filtered_results:
        return "No similar commit found above the score threshold."
    # Extract payload and add score to each result
    filtered_results = [
        {**x.payload, 'score': x.score} 
        for x in filtered_results
    ]
    return filtered_results

### Example Usage

**Semantic Search Function:**

This function performs vector-based similarity search on commit messages:

**Process Flow:**
1. **Query Embedding**: Convert the user's search query into a vector using the same model
2. **Vector Search**: Use Qdrant's `query_points` method to find the most similar commit message vectors
4. **Payload Inclusion**: Include commit metadata (hash, author, date, message) in results
5. **Score Threshold**: Filter results to only include those above a configurable similarity score
6. **Fallback**: Return helpful message if no similar commits are found

**Key Features:**
- Uses cosine similarity to measure semantic closeness
- Returns structured results with similarity scores
- Maintains consistent vector space between queries and documents
- Score threshold can be adjusted for more/less strict matching

In [11]:
similarity_commits = search_similar_commits("bug fix", score_threshold=0.4)


**Demo Search Query:**

This example demonstrates the semantic search functionality by searching for commits related to "bug fix":

**What This Does:**
- Converts "bug fix" into a vector representation
- Uses Qdrant's `query_points` method to search the commit database for semantically similar messages
- Returns the top 10 most similar commits with their metadata
- Filters results using the score threshold (0.5) to ensure quality matches
- Shows similarity scores indicating how closely each commit matches the query

**Expected Results:**
- Commits containing words like "fix", "bug", "issue", "resolve"
- Commits with similar semantic meaning even without exact word matches
- Results ranked by semantic similarity score
- Only commits with similarity scores above 0.5 are returned

## 4- Create a SQLite Database

### SQLite Database Setup

This section creates a SQLite database to store commit messages in a relational format. The database will have two tables:
- `repositories`: stores repository information (URL and name)
- `commits`: stores commit details with a foreign key reference to repositories

SQLite provides ACID transactions and allows us to perform complex queries on the commit data.

In [12]:
# Remove existing database file if it exists
if os.path.exists('commit_messages.db'):
    os.remove('commit_messages.db')

# Create fresh database connection
conn = sqlite3.connect('commit_messages.db')
cursor = conn.cursor()

**Database Connection Setup:**

This code establishes a connection to the SQLite database:

- **Database File**: Creates or connects to `commit_messages.db`
- **Auto-Creation**: SQLite automatically creates the file if it doesn't exist
- **Cursor Object**: Provides an interface to execute SQL commands
- **Local Storage**: Stores data persistently on the local file system

SQLite is chosen for its simplicity, zero-configuration setup, and ability to handle the commit data efficiently without requiring a separate database server.

Create a connection to the SQLite database file. If the file doesn't exist, SQLite will create it automatically. The cursor object allows us to execute SQL commands.

### Drop existing tables to ensure clean schema

In [13]:
cursor.execute('DROP TABLE IF EXISTS commits')
cursor.execute('DROP TABLE IF EXISTS repositories')

<sqlite3.Cursor at 0x77ad71fc0540>

**Clean Database Reset:**

These SQL commands ensure a fresh start by removing any existing tables:

- **DROP TABLE IF EXISTS**: Safely removes tables without errors if they don't exist
- **Order Matters**: Drops `commits` table first due to foreign key dependency on `repositories`
- **Clean Slate**: Prevents schema conflicts or data inconsistencies from previous runs
- **Idempotent**: Safe to run multiple times without side effects

This approach ensures that each notebook run starts with a consistent, empty database schema.

Clean slate approach: Drop any existing tables to ensure we start with a fresh schema. This prevents conflicts if you run the notebook multiple times or if the table structure has changed.

###  Create Tables

In [14]:
cursor.execute('''
    CREATE TABLE IF NOT EXISTS repositories (
        repository_url TEXT PRIMARY KEY,
        repository_name TEXT NOT NULL
    )
''')


<sqlite3.Cursor at 0x77ad71fc0540>

**Repositories Table Schema:**

This creates the parent table for storing repository information:

**Table Structure:**
- **repository_url**: Primary key, unique identifier for each repository
- **repository_name**: Human-readable name of the repository
- **Data Type**: TEXT fields for string data
- **Constraints**: PRIMARY KEY ensures uniqueness, NOT NULL prevents empty values

**Purpose:**
- Normalizes data to avoid redundancy
- Establishes the parent entity in a one-to-many relationship with commits
- Provides a clean separation between repository metadata and commit data

Create the `repositories` table to store unique repository information:
- `repository_url`: Primary key, the unique URL of the repository
- `repository_name`: Human-readable name of the repository

This table normalizes the data to avoid storing repository info redundantly for each commit.

In [15]:
cursor.execute('''
    CREATE TABLE IF NOT EXISTS commits (
        hash TEXT NOT NULL,
        author TEXT NOT NULL,
        date TEXT NOT NULL,
        message TEXT NOT NULL,
        repository_url TEXT NOT NULL,
        commit_similarity_score REAL,
        PRIMARY KEY (repository_url, hash),
        FOREIGN KEY (repository_url) REFERENCES repositories (repository_url)
    )
''')

<sqlite3.Cursor at 0x77ad71fc0540>

**Commits Table Schema:**

This creates the main table for storing detailed commit information:

**Table Structure:**
- **hash**: The unique Git commit SHA identifier
- **author**: Developer who made the commit
- **date**: Timestamp when the commit was made
- **message**: Full commit message content
- **repository_url**: Links to the repositories table (foreign key)

**Key Constraints:**
- **Composite Primary Key**: (repository_url, hash) ensures uniqueness across repos
- **Foreign Key**: Links commits to their respective repositories
- **NOT NULL**: All fields are required for data integrity

This schema supports querying commits by repository, author, date ranges, or message content.

Create the `commits` table to store commit information:
- `hash`: The unique commit hash/SHA
- `author`: Who made the commit
- `date`: When the commit was made
- `message`: The commit message content
- `repository_url`: Foreign key linking to the repositories table

The composite primary key (repository_url, hash) ensures uniqueness across repositories.

### Insert data from the commits list

In [16]:
for commit in similarity_commits:
    # Insert repository
    cursor.execute(
        'INSERT OR IGNORE INTO repositories (repository_url, repository_name) VALUES (?, ?)',
        (commit['repo_url'], commit['repo_name'])
    )
    

    cursor.execute(
        'INSERT OR IGNORE INTO commits (hash, author, date, message, repository_url, commit_similarity_score) VALUES (?, ?, ?, ?, ?, ?)',
        (commit['hash'], commit['author'], commit['date'], commit['message'], commit['repo_url'], commit.get('score'))
    )

conn.commit()
conn.close()

**Data Population Process:**

This code populates the database with all parsed commit data:

**Two-Step Insertion:**
1. **Repository Insert**: Adds repository metadata first (parent table)
2. **Commit Insert**: Adds commit details with foreign key reference

**Key Features:**
- **INSERT OR IGNORE**: Prevents duplicate entries and avoids errors
- **Referential Integrity**: Maintains proper foreign key relationships
- **Batch Processing**: Processes all commits in a single transaction
- **Data Consistency**: Ensures all commits have valid repository references

**Final Steps:**
- **commit()**: Saves all changes to the database file
- **close()**: Properly closes the database connection and frees resources

This creates a fully populated, normalized database ready for complex queries and analysis.

Insert all the parsed commit data into the database:

1. **Repository insertion**: Use `INSERT OR IGNORE` to add repositories without duplicates
2. **Commit insertion**: Insert each commit with its metadata, linking to the repository
3. **Transaction commit**: Save all changes to the database file
4. **Connection cleanup**: Close the database connection to free resources

The `OR IGNORE` clause prevents errors if we try to insert duplicate data.

### Query the Database

In [17]:
def query_sqlite(query):
    conn = sqlite3.connect('commit_messages.db')
    cursor = conn.cursor()

    # Execute the provided query
    cursor.execute(query)

    results = cursor.fetchall()
    column_names = [description[0] for description in cursor.description]
    formatted_results = []
    for row in results:
        row_dict = dict(zip(column_names, row))
        # Create a flexible result dictionary based on available columns
        result_entry = {}
        
        # Only format columns that exist in the query results
        if 'hash' in row_dict:
            result_entry['hash'] = row_dict['hash'][:8] + '...'
        if 'author' in row_dict:
            result_entry['author'] = row_dict['author']
        if 'date' in row_dict:
            result_entry['date'] = row_dict['date']
        if 'repository_name' in row_dict:
            result_entry['repository'] = row_dict['repository_name']
        if 'message' in row_dict:
            result_entry['message'] = row_dict['message'][:100] + '...'
        if 'commit_similarity_score' in row_dict:
            result_entry['commit_similarity_score'] = row_dict['commit_similarity_score']
        
        # Add any other columns that weren't specifically formatted
        for col in column_names:
            if col not in ['hash', 'author', 'date', 'repository_name', 'message', 'commit_similarity_score']:
                result_entry[col] = row_dict[col]
                
        formatted_results.append(result_entry)
    conn.close()
    return formatted_results

## 5 - Playground

#### Remember our similarity search was "bug fix" with a score threshold of 0.5.

### SQLite Query Analysis

**What this query does:**

This SQL query analyzes the commit data to find the most relevant commits based on similarity scores:

1. **Joins Tables**: Combines the `commits` and `repositories` tables to get complete information
2. **Filters by Quality**: Only includes commits with similarity scores above 0.5 (high relevance)
3. **Ranks Results**: Uses `ROW_NUMBER()` window function to rank commits within each repository
4. **Orders Results**: Sorts by highest similarity score first, then by repository name and date
5. **Limits Output**: Returns top 20 results to focus on the most relevant matches

**Key Features:**
- **Multi-Repository Analysis**: Shows results across all repositories in the dataset
- **Quality Threshold**: Filters out low-relevance matches (score > 0.5)
- **Repository Ranking**: Shows which commits are most relevant within each repo
- **Comprehensive Data**: Includes commit hash, author, date, message, and similarity score

**Use Case**: Perfect for finding the most semantically similar commits to your search query across multiple repositories, helping identify patterns in how different teams handle similar issues.

In [18]:
# Alternative Query 1: Author productivity on {bug fix} analysis
author_analysis_query = """
SELECT 
    c.author,
    r.repository_name,
    COUNT(*) as commit_count,
    AVG(c.commit_similarity_score) as avg_similarity_score,
    MAX(c.commit_similarity_score) as max_similarity_score
FROM commits c
JOIN repositories r ON c.repository_url = r.repository_url
WHERE c.commit_similarity_score IS NOT NULL
GROUP BY c.author, r.repository_name
HAVING commit_count >= 2
ORDER BY avg_similarity_score DESC, commit_count DESC
LIMIT 15;
"""

print("=== Author Productivity Analysis ===")
print("Shows authors with multiple commits and their average similarity scores to the search query")
author_results = query_sqlite(author_analysis_query)
for result in author_results:
    print(f"Author: {result['author']} | Repo: {result['repository']} | Commits: {result['commit_count']} | Avg Score: {result['avg_similarity_score']:.3f}")

=== Author Productivity Analysis ===
Shows authors with multiple commits and their average similarity scores to the search query
Author: josh11b@users.noreply.github.com | Repo: carbon-lang | Commits: 2 | Avg Score: 0.541
Author: richard@metafoo.co.uk | Repo: carbon-lang | Commits: 2 | Avg Score: 0.476


In [19]:
# Alternative Query 2: Time-based analysis
time_analysis_query = """
SELECT 
    substr(c.date, 1, 7) as year_month,
    r.repository_name,
    COUNT(*) as relevant_commits,
    AVG(c.commit_similarity_score) as avg_score,
    MIN(c.commit_similarity_score) as min_score,
    MAX(c.commit_similarity_score) as max_score
FROM commits c
JOIN repositories r ON c.repository_url = r.repository_url
WHERE c.commit_similarity_score IS NOT NULL 
    AND c.commit_similarity_score > 0.4
GROUP BY substr(c.date, 1, 7), r.repository_name
ORDER BY year_month DESC, avg_score DESC
LIMIT 10;
"""

print("\n=== Time-based Analysis ===")
print("Shows monthly trends of relevant commits (similarity > 0.4) across repositories")
time_results = query_sqlite(time_analysis_query)
for result in time_results:
    print(f"Month: {result['year_month']} | Repo: {result['repository']} | Count: {result['relevant_commits']} | Avg Score: {result['avg_score']:.3f}")


=== Time-based Analysis ===
Shows monthly trends of relevant commits (similarity > 0.4) across repositories
Month: 2025-06 | Repo: pyrefly | Count: 1 | Avg Score: 0.450
Month: 2023-10 | Repo: carbon-lang | Count: 1 | Avg Score: 0.517
Month: 2023-08 | Repo: carbon-lang | Count: 1 | Avg Score: 0.460
Month: 2022-10 | Repo: carbon-lang | Count: 1 | Avg Score: 0.491
Month: 2022-08 | Repo: carbon-lang | Count: 1 | Avg Score: 0.564
Month: 2022-07 | Repo: carbon-lang | Count: 3 | Avg Score: 0.515
Month: 2021-10 | Repo: carbon-lang | Count: 1 | Avg Score: 0.670
Month: 2021-05 | Repo: carbon-lang | Count: 1 | Avg Score: 0.551


In [None]:
# Alternative Query 3: Repository Quality Analysis - Compare bug fix patterns
repo_quality_query = """
SELECT 
    r.repository_name,
    COUNT(*) as total_commits_analyzed,
    COUNT(CASE WHEN c.commit_similarity_score >= 0.6 THEN 1 END) as high_confidence_bugfixes,
    COUNT(CASE WHEN c.commit_similarity_score >= 0.4 AND c.commit_similarity_score < 0.6 THEN 1 END) as medium_confidence_bugfixes,
    ROUND(
        (COUNT(CASE WHEN c.commit_similarity_score >= 0.4 THEN 1 END) * 100.0 / COUNT(*)), 2
    ) as bug_fix_percentage,
    AVG(c.commit_similarity_score) as avg_similarity_score,
    COUNT(DISTINCT c.author) as unique_contributors
FROM commits c
JOIN repositories r ON c.repository_url = r.repository_url
WHERE c.commit_similarity_score IS NOT NULL
GROUP BY r.repository_name
ORDER BY bug_fix_percentage DESC;
"""

print("\n=== Repository Quality Analysis ===")
print("Compares repositories by their bug fix activity and developer engagement")
quality_results = query_sqlite(repo_quality_query)
for result in quality_results:
    print(f"🏗️ Repository: {result['repository']}")
    print(f"   Total Commits: {result['total_commits_analyzed']} | Contributors: {result['unique_contributors']}")
    print(f"   High Confidence Bug Fixes: {result['high_confidence_bugfixes']} | Medium: {result['medium_confidence_bugfixes']}")
    print(f"   Bug Fix Rate: {result['bug_fix_percentage']}% | Avg Score: {result['avg_similarity_score']:.3f}")
    print("---")


=== Repository Quality Analysis ===
Compares repositories by their bug fix activity and developer engagement
🏗️ Repository: pyrefly
   Total Commits: 1 | Contributors: 1
   High Confidence Bug Fixes: 0 | Medium: 1
   Bug Fix Rate: 100.0% | Avg Score: 0.450
---
🏗️ Repository: carbon-lang
   Total Commits: 9 | Contributors: 7
   High Confidence Bug Fixes: 1 | Medium: 8
   Bug Fix Rate: 100.0% | Avg Score: 0.533
---
