# Objectives

1. **Search for candidate repos** (containing prompts in .py/.txt files) using:
    - LangChain
    - Guidance (by Microsoft)
    - LlamaIndex
2. **Find Prompts**:
    - 2.1. Filter down via dirs and files
        - Look at dirs for 'template' or 'prompt' folders (or files)
    - 2.2. Filter down via code search *(hint: use tree-sitter)*:    
        - imports some library like openai, hugginface, etc.
        - (are they in files? Strings?
        - How many use variables?
        - Do they concat, use f-strings, use format? Etc.)
3. Run professor's **sslim check tool** on them

Professor's tool: https://github.com/kpister/sllim (semantic analysis to detect errors in prompt files)

### 📚 Candidate Repos

In [111]:
import requests, json
from pprint import pprint

def fetch_data(query="langchain+OR+GUIDANCE+OR+LlamaIndex", sort="stars", order="asc", per_page=100, language="python"):
    """
    # GitHub API URL for searching repositories
    # DOCS: https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-repositories

    # Params Default
    query = "langchain+OR+GUIDANCE+OR+LlamaIndex"
    sort = "stars"
    order = "asc"
    per_page = 100  # Max 100
    language = "python"

    Returns a results dict with the following structure:
    {
        total_count: int,
        items: [{repo1_info}, {repo2_info}, ...]
    }
    """
    # Setting up result dict and file
    result = {"total_count": 0, "items": []}
    
    # NOTE: Only the first 1000 search results are available through this API
    print("Fetching all 10 pages (assuming there're >= 1000 results)")
    for page in range(1, 11):
        url = f"https://api.github.com/search/repositories?q={query}+language:{language}&sort={sort}&order={order}&per_page={per_page}&page={page}"
        # Make the API request and get the JSON response
        response = requests.get(url)
        data = response.json()

        # Check if the request was successful
        if response.status_code != 200:
            raise Exception(data.get("message", "Unknown error"))

        # Check if the API returned an error
        if "message" in data:
            raise Exception(data["message"])
        
        # Add the results to the result list and file
        result["items"].extend(data["items"])
        print(f"Page {page} done")

        # TODO: Remove Later ######################
        # TESTING
        print(url)
        break
        ###########################################

    # Add the total count to the result
    result["total_count"] = data["total_count"]

    return result

# WARNING:  AVOID EXCEEDING THE API rate limit. 
# STORING RESULT
##################################################
# res = fetch_data()
# with open("repos.json", "w") as file:
#     json.dump(res, file, indent=4)
#     pprint(result)

Fetching all 10 pages (assuming there're >= 1000 results)
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=1
Page 1 done
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=2
Page 2 done
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=3
Page 3 done
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=4
Page 4 done
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=5
Page 5 done
https://api.github.com/search/repositories?q=langchain+OR+GUIDANCE+OR+LlamaIndex+language:python&sort=stars&order=asc&per_page=100&page=6
Page 6 done
https://api.github.com/search/repositories

Exception: API rate limit exceeded for 50.93.222.44. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

### 🔎 Find Prompts

✅ Check the prompts using SSLIM