#🔹 SOP 1：Google Custom Search API Application and Usage

Google Custom Search API allows programmatic querying of Google search results and retrieving the top 10 pages of search results.

##✅ 1️⃣Apply for Google Cloud API

    Go to Google Cloud Console
    Create a new project  
        Click "Create Project"  
        Enter a project name (e.g., GoogleSearchAPI)  
        Select "No organization" for location  
        Click "Create"  
    Ensure you have selected the project  
        Select the newly created project in the top-left corner  

##✅ 2️⃣ Enable Google Custom Search API

    Go to API Console : Custom Search API  
    Click "Enable"  
    Wait for the API to be enabled  

##✅ 3️⃣ Create API Key

    Go to "APIs & Services" → "Credentials" click Credentials Management Page  
    Click "Create Credentials" → Select "API Key"  
    Copy the API Key  
    (Optional) Set API restrictions  
        Application Restrictions → Select "None"  
        API Restrictions → Select "Google Custom Search API"  
    Click "Save"  

##✅ 4️⃣ Create Google Custom Search Engine (CSE)

    Go to Google CSE enter CSE Console  
    Click "Add"  
    "Sites to search" select *.google.com or "Search the entire web"  
    Click "Create"  
    Go to "Control Panel"  
    Copy the "Search Engine ID (CSE ID)"  

#🔹 SOP 2：Reddit API Application and Usage

Reddit API allows you to query subreddit posts, user information, and even send posts and comments.

##✅ 1️⃣ Apply for Reddit API

    Go to Reddit Developer Platform  
    Log in to your Reddit account  
    Click "Create App"  
    Fill in the app information  
        App Type: Select "script"  
        App Name: Enter an app name (e.g., RedditAPIApp)  
        About URL: Can be left blank  
        Redirect URL: Enter http://localhost:8080  
        Permissions: Default is fine  
    Click "Create App"  
    Note down the Client ID & Client Secret  
        Client ID: 14-character alphanumeric code below the app name  
        Client Secret: The secret key on the right  

Downlod

In [None]:
import pandas as pd

import requests
# Used for sending HTTP requests to interact with APIs or websites

import nest_asyncio
# Resolves asyncio runtime issues in Jupyter Notebook, allowing async code to run properly in Notebook

import asyncio
# Provides asynchronous I/O support for writing non-blocking asynchronous code

!pip install asyncpraw
import asyncpraw
# An asynchronous client for Reddit API, used for asynchronously fetching Reddit data

from datetime import datetime, timedelta
# Handles dates and times, used for time calculations and formatting

nest_asyncio.apply()
# Enables nest_asyncio to support asyncio's event loop in Jupyter Notebook

Collecting asyncpraw
  Downloading asyncpraw-7.8.1-py3-none-any.whl.metadata (9.0 kB)
Collecting aiofiles (from asyncpraw)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting aiosqlite<=0.17.0 (from asyncpraw)
  Downloading aiosqlite-0.17.0-py3-none-any.whl.metadata (4.1 kB)
Collecting asyncprawcore<3,>=2.4 (from asyncpraw)
  Downloading asyncprawcore-2.4.0-py3-none-any.whl.metadata (5.5 kB)
Collecting update_checker>=0.18 (from asyncpraw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading asyncpraw-7.8.1-py3-none-any.whl (196 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.4/196.4 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading aiosqlite-0.17.0-py3-none-any.whl (15 kB)
Downloading asyncprawcore-2.4.0-py3-none-any.whl (19 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Downloading aiofiles-24.1.0-py3-none-any.whl (15 kB)
Installing collected packages: aiosqlite, aiofiles, update

#paste your api key

In [None]:
#  Google setting
API_KEY = "change to yours"
SEARCH_ENGINE_ID = "change to yours"


#  Reddit API setting
reddit = asyncpraw.Reddit(
    client_id="change to yours",
    client_secret="change to yours",
    user_agent="RedditJobSearchBot/1.0"
)

In [None]:
#  Load all sheets from the Excel
file_path = "/content/2025QS.xlsx"
sheets = pd.read_excel(file_path, sheet_name=None)

In [None]:
#  Extract sheets for Social Science and World Rankings
df_social_science = sheets["Social Science"]
df_world_rankings = sheets["World Rankings"]

#  Normalize country names for consistency across data sources
country_mapping = {
    "United States of America": "USA", "United States": "USA",
    "United Kingdom": "UK", "China (Mainland)": "China",
    "Taiwan": "Taiwan", "Russian Federation": "Russia",
    "South Korea": "South Korea", "Republic of Korea": "South Korea",
    "Hong Kong SAR": "Hong Kong", "Macau SAR": "Macau"
}
df_social_science["location"] = df_social_science["location"].replace(country_mapping)
df_world_rankings["location code"] = df_world_rankings["location code"].replace(country_mapping)

#  Convert ranking columns to numeric and drop rows with missing rankings
df_social_science["2025"] = pd.to_numeric(df_social_science["2025"], errors="coerce")
df_world_rankings["rank display"] = pd.to_numeric(df_world_rankings["rank display"], errors="coerce")
df_social_science.dropna(subset=["2025"], inplace=True)
df_world_rankings.dropna(subset=["rank display"], inplace=True)

#  Retrieve the top N universities in a given country
def get_top_n_universities(df, country_col, rank_col, name_col, country, n):
    df[country_col] = df[country_col].astype(str).str.strip()
    country = country.strip().lower()
    country_df = df[df[country_col].str.lower() == country]
    if len(country_df) == 0:
        return []
    country_df = country_df.sort_values(by=rank_col, ascending=True).reset_index(drop=True)
    return country_df.iloc[:n][name_col].tolist()


In [None]:
#  Use Google Search API to look for relevant research terms
def search_google(query):
    url = f"https://www.googleapis.com/customsearch/v1?q={query}&key={API_KEY}&cx={SEARCH_ENGINE_ID}&num=5"
    response = requests.get(url)
    data = response.json()
    results = []
    for item in data.get("items", []):
        results.append((item["title"], item["link"]))
    return results

In [None]:
#  Search across Reddit for internship/research opportunities
async def search_reddit(universities, keywords):
    four_months_ago = datetime.now() - timedelta(days=120)
    print("\n【Searching Reddit posts】\n")

    results_found = 0
    subreddit = await reddit.subreddit("all")

    for uni in universities:
        for keyword in keywords:
            query = f'"{uni}" {keyword}'
            print(f"\n：{query}")

            async for post in subreddit.search(query, limit=30, sort="new"):
                post_time = datetime.fromtimestamp(post.created_utc)

                if post_time >= four_months_ago:
                    print(f"Title: {post.title}")
                    print(f"Date: {post_time.strftime('%Y-%m-%d')}")
                    print(f"Link: {post.url}")
                    print(f"Subreddit: r/{post.subreddit}")
                    print("-" * 50)
                    results_found += 1

    if results_found == 0:
        print("No relevant Reddit posts found.")

Enter your target country and the number of top universities you want from the QS overall ranking(type 1) or QS sociall science ranking(type 2).
For example, if you want the top 5 universities in China:
First, enter 1 (for QS overall ranking), then enter China (you can also use USA, UK, Russia, Japan, Honkong, India, Taiwan, Korea etc.), then enter 5.
You should see results like Tsinghua, Peking, Fudan, Jiaotong, and Zhejiang.
If you enter 1, then Russia, then 2, you should get HSE and MSU (just as I dreamed).

In [None]:
#  Main query function to fetch universities and trigger Google/Reddit searches
def university_query():
    print("請選擇查詢類型 / 请选择查询类型 /\n 種類を選択してください / 조회 유형 선택 /\n Выберите тип запроса /\n Choisissez le type de requête / Wählen Sie den Abfragetyp /\n Please select the query type")
    ranking_type = input("(1: QS綜合排名 World Ranking, 2: QS社會科學排名 Social Science Ranking): ").strip()

    print("請輸入要查詢的國家 / 输入国家 /\n 国を入力してください / 국가를 입력하세요 /\n Введите страну /\n Entrez le pays / Geben Sie das Land ein /\n Enter the country")
    country = input("例如 e.g., USA、UK、Russia: ").strip()

    print("請輸入該國前幾名大學 / 输入该国前N名大学 /\n 上位校を入力 / 상위 대학 수 입력 /\n Введите число лучших вузов /\n Entrez le nombre d'universités / Geben Sie die Anzahl der Top-Unis ein /\n Enter the top N universities:")
    n = input("數字 only number: ").strip()

    if not n.isdigit():
        print("請輸入正確數字 / 请输入正确的数字 /\n 数字を入力してください / 숫자를 입력하세요 /\n Введите правильное число /\n Entrez un nombre valide / Geben Sie eine gültige Zahl ein /\n Please enter a valid number.")
        return

    n = int(n)
    if ranking_type == '1':
        universities = get_top_n_universities(df_world_rankings, "location code", "rank display", "institution", country, n)
    else:
        universities = get_top_n_universities(df_social_science, "location", "2025", "Institution", country, n)

    if not universities:
        print(f"找不到 {country} 的學校 / 未找到 {country} 的大学 /\n {country} の大学が見つかりません / {country}의 대학을 찾을 수 없습니다 /\n Университеты {country} не найдены /\n Aucune université trouvée pour {country} / Keine Universität in {country} gefunden /\n No universities found in {country}")
        return

    print(f"\n{country.upper()} 前 {n} 名大學 / Top {n} Universities in {country.upper()}:")
    for i, uni in enumerate(universities, 1):
        print(f"{i}. {uni}")

    search_keywords = [
        "summer research internship", "remote research assistant", "research scholarship",
    ]

    print("\n正在使用 Google 搜尋研究機會 / 正在用Google搜索研究机会 /\n Googleで研究機会を検索中 / Google을 사용하여 연구 기회 검색 중 /\n Идёт поиск исследовательских возможностей в Google /\n Recherche d'opportunités de recherche via Google / Recherche über Google nach Forschungsangeboten /\n Searching research opportunities using Google...\n")
    for uni in universities:
        print(f"\n{uni} 的搜尋結果 / Search results for {uni}:")
        for keyword in search_keywords:
            query = f"{uni} {keyword}"
            results = search_google(query)
            for title, link in results:
                print(f"{title}\n{link}")

    print("\n搜尋 Reddit 貼文中 / 正在搜索Reddit /\n Redditの投稿を検索中 / Reddit 게시물 검색 중 /\n Поиск по Reddit /\n Recherche sur Reddit / Suche in Reddit /\n Searching Reddit...\n")
    loop = asyncio.get_event_loop()
    loop.run_until_complete(search_reddit(universities, search_keywords))

# ✅ Start the search process
university_query()

請選擇查詢類型 / 请选择查询类型 /
 種類を選択してください / 조회 유형 선택 /
 Выберите тип запроса /
 Choisissez le type de requête / Wählen Sie den Abfragetyp /
 Please select the query type
(1: QS綜合排名 World Ranking, 2: QS社會科學排名 Social Science Ranking): 1
請輸入要查詢的國家 / 输入国家 /
 国を入力してください / 국가를 입력하세요 /
 Введите страну /
 Entrez le pays / Geben Sie das Land ein /
 Enter the country
例如 e.g., USA、UK、Russia: usa
請輸入該國前幾名大學 / 输入该国前N名大学 /
 上位校を入力 / 상위 대학 수 입력 /
 Введите число лучших вузов /
 Entrez le nombre d'universités / Geben Sie die Anzahl der Top-Unis ein /
 Enter the top N universities:
數字 only number: 1

USA 前 1 名大學 / Top 1 Universities in USA:
1. Massachusetts Institute of Technology (MIT) 

正在使用 Google 搜尋研究機會 / 正在用Google搜索研究机会 /
 Googleで研究機会を検索中 / Google을 사용하여 연구 기회 검색 중 /
 Идёт поиск исследовательских возможностей в Google /
 Recherche d'opportunités de recherche via Google / Recherche über Google nach Forschungsangeboten /
 Searching research opportunities using Google...


Massachusetts Institute of Technology 