# The Popularity of Prompt Engineering Methods

This notebook produces statistics on Semantic Scholar citations per day for all of the prompt engineering approaches listed at "https://www.promptingguide.ai/papers" on October 10, 2023.


In [1]:
# Packages
import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm


<!DOCTYPE html>
<html lang="en"><head><script charset="utf-8" src="/_static/js/bundle-playback.js?v=1WaXNDFE" type="text/javascript"></script>
<script charset="utf-8" src="/_static/js/wombat.js?v=txqj7nKC" type="text/javascript"></script>
<script>window.RufflePlayer=window.RufflePlayer||{};window.RufflePlayer.config={"autoplay":"on","unmuteOverlay":"hidden"};</script>
<script src="/_static/js/ruffle.js" type="text/javascript"></script>
<script type="text/javascript">
  __wm.init("https://web.archive.org/web");
  __wm.wombat("https://www.promptingguide.ai/papers","20231010055104","https://web.archive.org/","web","/_static/",
	      "1696917064");
</script>
<link href="/_static/css/banner-styles.css?v=S1zqJCYt" rel="stylesheet" type="text/css"/>
<link href="/_static/css/iconochive.css?v=qtvMKcIJ" rel="stylesheet" type="text/css"/>
<!-- End Wayback Rewrite JS Include -->
<meta charset="utf-8"/><meta content="index,follow" name="robots"/><meta content="Papers – Nextra" property="og:title"/

In [None]:
# Scrape list of "Approaches" papers on "https://www.promptingguide.ai/papers"
# I created an internet archive version of this page on October 10, 2023
url = "https://web.archive.org/web/20231010055104/https%3A%2F%2Fwww.promptingguide.ai%2Fpapers"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)


In [3]:
# Strip html tags
no_tags = soup.get_text()
print(no_tags)










Papers | Prompt Engineering Guide Prompt Engineering GuidePrompt Engineering CoursePrompt Engineering CourseServicesServicesAboutAboutGitHubGitHub (opens in a new tab)DiscordDiscord (opens in a new tab)Prompt EngineeringIntroductionLLM SettingsBasics of PromptingPrompt ElementsGeneral Tips for Designing PromptsExamples of PromptsTechniquesZero-shot PromptingFew-shot PromptingChain-of-Thought PromptingSelf-ConsistencyGenerate Knowledge PromptingTree of ThoughtsRetrieval Augmented GenerationAutomatic Reasoning and Tool-useAutomatic Prompt EngineerActive-PromptDirectional Stimulus PromptingReActMultimodal CoTGraph PromptingApplicationsProgram-Aided Language ModelsGenerating DataGenerating Synthetic Dataset for RAGTackling Generated Datasets DiversityGenerating CodeGraduate Job Classification Case StudyPrompt FunctionModelsFlanChatGPTLLaMAGPT-4LLM CollectionRisks & MisusesAdversarial PromptingFactualityBiasesPapersToolsNotebooksDatasetsAdditional ReadingsEnglishLightOn This PageOve

In [4]:
# Get approaches by taking text after the line "Approaches" and before the line "Applications"

approaches = no_tags.split("\nApproaches")[1].split("\nApplications")[0]
print(approaches)




Chain-of-Verification Reduces Hallucination in Large Language Models (opens in a new tab) (September 2023)
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers (opens in a new tab) (September 2023)
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting (opens in a new tab) (September 2023)
Re-Reading Improves Reasoning in Language Models (opens in a new tab) (September 2023)
Graph of Thoughts: Solving Elaborate Problems with Large Language Models (opens in a new tab) (August 2023)
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding (opens in a new tab) (July 2023)
Focused Prefix Tuning for Controllable Text Generation (opens in a new tab) (June 2023)
Exploring Lottery Prompts for Pre-trained Language Models (opens in a new tab) (May 2023)
Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses (opens in a new tab) (May 2023)
Let's Verify Step by Step (opens in a new tab) (May 202

In [5]:
# Remove fully empty lines
no_empty_lines = approaches.replace("\n\n", "\n")
print(no_empty_lines)



Chain-of-Verification Reduces Hallucination in Large Language Models (opens in a new tab) (September 2023)
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers (opens in a new tab) (September 2023)
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting (opens in a new tab) (September 2023)
Re-Reading Improves Reasoning in Language Models (opens in a new tab) (September 2023)
Graph of Thoughts: Solving Elaborate Problems with Large Language Models (opens in a new tab) (August 2023)
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding (opens in a new tab) (July 2023)
Focused Prefix Tuning for Controllable Text Generation (opens in a new tab) (June 2023)
Exploring Lottery Prompts for Pre-trained Language Models (opens in a new tab) (May 2023)
Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses (opens in a new tab) (May 2023)
Let's Verify Step by Step (opens in a new tab) (May 2023

In [6]:
# Create a dataframe of approaches
# First column is title (content before "(opens in a new tab)")
# Second column is month (content after "(opens in a new tab)")

approaches_papers = pd.DataFrame(columns=["title", "month"])
for line in no_empty_lines.split("\n"):
    if line != "":
        title = line.split("(opens in a new tab)")[0]
        month = line.split("(opens in a new tab)")[1]
        new_record = pd.DataFrame([{"title": title, "month": month}])
        approaches_papers = pd.concat([approaches_papers, new_record], ignore_index=True)

print(approaches_papers)


                                                 title              month
0    Chain-of-Verification Reduces Hallucination in...   (September 2023)
1    Connecting Large Language Models with Evolutio...   (September 2023)
2    From Sparse to Dense: GPT-4 Summarization with...   (September 2023)
3    Re-Reading Improves Reasoning in Language Models    (September 2023)
4    Graph of Thoughts: Solving Elaborate Problems ...      (August 2023)
..                                                 ...                ...
143                   Learning from Task Descriptions     (November 2020)
144  AutoPrompt: Eliciting Knowledge from Language ...     (October 2020)
145             Language Models are Few-Shot Learners          (May 2020)
146        How Can We Know What Language Models Know?         (July 2020)
147           Scaling Laws for Neural Language Models      (January 2020)

[148 rows x 2 columns]


In [14]:
# Semantic scholar dataframe
semantic_scholar_df = pd.DataFrame()
no_results_df = pd.DataFrame()

# Loop over papers
for paper_title in tqdm(approaches_papers["title"]):
    # Replace hyphens with a space (per documentation)
    query = paper_title.replace("-", " ")
    # Query semantic scholar
    r = requests.get(
    'https://api.semanticscholar.org/graph/v1/paper/search?query=' + query + '&fields=paperId,title,citationCount,publicationDate,year&limit=1'
    )
    # Attempt for a returned result
    try:
        paper_id = r.json()['data'][0]['paperId']
        semantic_scholar_title = r.json()['data'][0]['title']
        ss_publication_date = r.json()['data'][0]['publicationDate']
        ss_year = r.json()['data'][0]['year']
        # Citation count as of October 10, 2023
        paper_ss_url = f'https://api.semanticscholar.org/v1/paper/{paper_id}'
        response = requests.get(paper_ss_url)
        paper_data = response.json()
        citations = paper_data.get('citations', [])
        # Get paper ids of citing papers
        citing_paper_ids = [citation['paperId'] for citation in citations]
        # Loop over citing papers. If citing paper was published before October 10, 2023, add to count
        constructed_citation_count = 0
        for citing_paper_id in citing_paper_ids:
            paper_ss_url = f'https://api.semanticscholar.org/v1/paper/{citing_paper_id}'
            response = requests.get(paper_ss_url)
            citing_paper_data = response.json()
            if paper_data['publicationDate'] <= '2023-10-10':
                constructed_citation_count += 1
        # Add on paper data
        new_record = pd.DataFrame([{"paper title":paper_title, "semantic scholar title": semantic_scholar_title, "ss_publication_date": ss_publication_date, "ss_year": ss_year, "citation_count": constructed_citation_count, "query": query}])
        semantic_scholar_df = pd.concat([semantic_scholar_df, new_record], ignore_index=True)
    # Error catch for no results
    except:
        new_record = pd.DataFrame([{"paper title":paper_title, "query": query}])
        no_results_df = pd.concat([no_results_df, new_record], ignore_index=True)


 34%|███▍      | 50/148 [01:41<04:21,  2.67s/it]

In [None]:
semantic_scholar_df


In [None]:
no_results_df


In [5]:
import requests

# Replace with the actual paper ID you are interested in
paper_id = '8d5a41c94e01c10ed7baf7dd68a851790275a1e5'

url = f'https://api.semanticscholar.org/v1/paper/{paper_id}'

response = requests.get(url)

if response.status_code == 200:
    paper_data = response.json()
    print(paper_data.keys())
    #print(paper_data.get('publicationDate'))
    citations = paper_data.get('citations', [])
    # Get paper ids of citing papers
    citing_paper_ids = [citation['paperId'] for citation in citations]
    # Get publication dates of citing papers
    response = requests.post('https://api.semanticscholar.org/graph/v1/paper', json={'ids': citing_paper_ids})
    # Filter citations by date here
    #print(citations)
    #filtered_citations = [citation for citation in citations if citation['publicationDate'] <= '2021-10-10']
    #print(filtered_citations)
else:
    print('Failed to retrieve data')


dict_keys(['abstract', 'arxivId', 'authors', 'citationVelocity', 'citations', 'corpusId', 'doi', 'fieldsOfStudy', 'influentialCitationCount', 'isOpenAccess', 'isPublisherLicensed', 'is_open_access', 'is_publisher_licensed', 'numCitedBy', 'numCiting', 'paperId', 'references', 's2FieldsOfStudy', 'title', 'topics', 'url', 'venue', 'year'])


In [8]:
import requests
import json

r = requests.get(
    'https://api.semanticscholar.org/graph/v1/paper/search?query=covid+vaccination&fields=title,citationCount'#,
    #params={'query': 'Semantic'}
    #params={'fields': 'referenceCount,citationCount,title'}#,
    #json={"ids": ["649def34f8be52c8b66281af98ae884c09aef38b", "ARXIV:2106.15928"]}
)
#print(r)
print(json.dumps(r.json(), indent=2))


{
  "total": 1196604,
  "offset": 0,
  "next": 10,
  "data": [
    {
      "paperId": "8d5a41c94e01c10ed7baf7dd68a851790275a1e5",
      "title": "The Impact of COVID Vaccination on Symptoms of Long COVID: An International Survey of People with Lived Experience of Long COVID",
      "citationCount": 33
    },
    {
      "paperId": "e2fea94b98c5d3f9e5264569422c8250fd224476",
      "title": "Long COVID Risk and Pre-COVID Vaccination: An EHR-Based Cohort Study from the RECOVER Program",
      "citationCount": 14
    },
    {
      "paperId": "0f722d8d3882006980da6ddce74da5bcd8361fb0",
      "title": "Bilateral optic neuritis after COVID vaccination",
      "citationCount": 10
    },
    {
      "paperId": "dedb8621c013a1f6a07e6403f00af32b433e2a81",
      "title": "COVID vaccination in older adults",
      "citationCount": 12
    },
    {
      "paperId": "dca76137d55a9714f41c0b804546181611ff0ca8",
      "title": "The Impact of COVID Vaccination on Symptoms of Long COVID. An International 

In [14]:
# Language Models are Few-Shot Learners
q_string = "Language Models are Few-Shot Learners"
# Put string in URL format
import urllib.parse
q_string_url = urllib.parse.quote(q_string)
print(q_string_url)

r = requests.get(
    'https://api.semanticscholar.org/graph/v1/paper/search?query=' + q_string + '&fields=title,citationCount'
)
print(r.json)



Language%20Models%20are%20Few-Shot%20Learners
{'total': 3625, 'offset': 0, 'next': 10, 'data': [{'paperId': '6b85b63579a916f705a8e10a49bd8d849d91b1fc', 'title': 'Language Models are Few-Shot Learners', 'citationCount': 15624}, {'paperId': '236445f0a3b1e30b2542e5e64616ff6a8af7e3ea', 'title': 'Language Models are Few-shot Learners for Prognostic Prediction', 'citationCount': 21}, {'paperId': 'a6a83754a0d1e9a8e41c1e9bbdbca32d3b9d1fd3', 'title': 'It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners', 'citationCount': 618}, {'paperId': '85e7d63f75c0916bd350a229e040c5fbb1472e7a', 'title': 'Making Pre-trained Language Models Better Few-shot Learners', 'citationCount': 1037}, {'paperId': 'ff0b2681d7b05e16c46dfb71d980cc2f605907cd', 'title': 'Finetuned Language Models Are Zero-Shot Learners', 'citationCount': 1105}, {'paperId': '42fc019b2668c9d9d984154d4c57f6c6d5a91619', 'title': 'Language Models are Few-shot Multilingual Learners', 'citationCount': 64}, {'paperId': 

In [20]:
# Convert to dataframe
import pandas as pd

print(r.json()['data'][0]['title'])
print(r.json()['data'][0]['citationCount'])


{'total': 3625, 'offset': 0, 'next': 10, 'data': [{'paperId': '6b85b63579a916f705a8e10a49bd8d849d91b1fc', 'title': 'Language Models are Few-Shot Learners', 'citationCount': 15624}, {'paperId': '236445f0a3b1e30b2542e5e64616ff6a8af7e3ea', 'title': 'Language Models are Few-shot Learners for Prognostic Prediction', 'citationCount': 21}, {'paperId': 'a6a83754a0d1e9a8e41c1e9bbdbca32d3b9d1fd3', 'title': 'It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners', 'citationCount': 618}, {'paperId': '85e7d63f75c0916bd350a229e040c5fbb1472e7a', 'title': 'Making Pre-trained Language Models Better Few-shot Learners', 'citationCount': 1037}, {'paperId': 'ff0b2681d7b05e16c46dfb71d980cc2f605907cd', 'title': 'Finetuned Language Models Are Zero-Shot Learners', 'citationCount': 1105}, {'paperId': '42fc019b2668c9d9d984154d4c57f6c6d5a91619', 'title': 'Language Models are Few-shot Multilingual Learners', 'citationCount': 64}, {'paperId': '6dd44624ac912fb50c21c691806ee52d27e73abb', 't

15624

In [39]:
# Add days between publication and October 8, 2023 as a column to semantic_scholar_df
from datetime import datetime
from datetime import date
import numpy as np

# Convert publication date to datetime object
semantic_scholar_df["ss_publication_date"] = pd.to_datetime(semantic_scholar_df["ss_publication_date"])

print(semantic_scholar_df["ss_publication_date"])

# Column for October 8, 2023
semantic_scholar_df["end_date"] = datetime(2023, 10, 8)

# Calculate days between publication and October 8, 2023
semantic_scholar_df["days_from_pub_to_10_8_2023"] = (semantic_scholar_df['end_date'] - semantic_scholar_df['ss_publication_date']) / np.timedelta64(1, 'D')

semantic_scholar_df


0     2023-09-20
1     2023-09-15
2     2023-09-08
3     2023-09-12
4     2023-08-18
         ...    
105   2021-01-01
106   2020-11-01
107   2020-10-29
108   2020-05-28
109   2019-11-28
Name: ss_publication_date, Length: 110, dtype: datetime64[ns]


Unnamed: 0,paper title,semantic scholar title,ss_publication_date,ss_year,citation_count,query,end_date,days_from_pub_to_10_8_2023
0,Chain-of-Verification Reduces Hallucination in...,Chain-of-Verification Reduces Hallucination in...,2023-09-20,2023,3,Chain of Verification Reduces Hallucination in...,2023-10-08,18.0
1,Connecting Large Language Models with Evolutio...,Connecting Large Language Models with Evolutio...,2023-09-15,2023,5,Connecting Large Language Models with Evolutio...,2023-10-08,23.0
2,From Sparse to Dense: GPT-4 Summarization with...,From Sparse to Dense: GPT-4 Summarization with...,2023-09-08,2023,1,From Sparse to Dense: GPT 4 Summarization with...,2023-10-08,30.0
3,Re-Reading Improves Reasoning in Language Models,Re-Reading Improves Reasoning in Language Models,2023-09-12,2023,1,Re Reading Improves Reasoning in Language Models,2023-10-08,26.0
4,Graph of Thoughts: Solving Elaborate Problems ...,Graph of Thoughts: Solving Elaborate Problems ...,2023-08-18,2023,22,Graph of Thoughts: Solving Elaborate Problems ...,2023-10-08,51.0
...,...,...,...,...,...,...,...,...
105,Making Pre-trained Language Models Better Few-...,Making Pre-trained Language Models Better Few-...,2021-01-01,2021,1037,Making Pre trained Language Models Better Few ...,2023-10-08,1010.0
106,Learning from Task Descriptions,Learning from Task Descriptions,2020-11-01,2020,63,Learning from Task Descriptions,2023-10-08,1071.0
107,AutoPrompt: Eliciting Knowledge from Language ...,Eliciting Knowledge from Language Models Using...,2020-10-29,2020,215,AutoPrompt: Eliciting Knowledge from Language ...,2023-10-08,1074.0
108,Language Models are Few-Shot Learners,Language Models are Few-Shot Learners,2020-05-28,2020,15624,Language Models are Few Shot Learners,2023-10-08,1228.0


In [40]:
# Add a column for citations per day
semantic_scholar_df["citations_per_day"] = semantic_scholar_df["citation_count"] / semantic_scholar_df["days_from_pub_to_10_8_2023"]

semantic_scholar_df


Unnamed: 0,paper title,semantic scholar title,ss_publication_date,ss_year,citation_count,query,end_date,days_from_pub_to_10_8_2023,citations_per_day
0,Chain-of-Verification Reduces Hallucination in...,Chain-of-Verification Reduces Hallucination in...,2023-09-20,2023,3,Chain of Verification Reduces Hallucination in...,2023-10-08,18.0,0.166667
1,Connecting Large Language Models with Evolutio...,Connecting Large Language Models with Evolutio...,2023-09-15,2023,5,Connecting Large Language Models with Evolutio...,2023-10-08,23.0,0.217391
2,From Sparse to Dense: GPT-4 Summarization with...,From Sparse to Dense: GPT-4 Summarization with...,2023-09-08,2023,1,From Sparse to Dense: GPT 4 Summarization with...,2023-10-08,30.0,0.033333
3,Re-Reading Improves Reasoning in Language Models,Re-Reading Improves Reasoning in Language Models,2023-09-12,2023,1,Re Reading Improves Reasoning in Language Models,2023-10-08,26.0,0.038462
4,Graph of Thoughts: Solving Elaborate Problems ...,Graph of Thoughts: Solving Elaborate Problems ...,2023-08-18,2023,22,Graph of Thoughts: Solving Elaborate Problems ...,2023-10-08,51.0,0.431373
...,...,...,...,...,...,...,...,...,...
105,Making Pre-trained Language Models Better Few-...,Making Pre-trained Language Models Better Few-...,2021-01-01,2021,1037,Making Pre trained Language Models Better Few ...,2023-10-08,1010.0,1.026733
106,Learning from Task Descriptions,Learning from Task Descriptions,2020-11-01,2020,63,Learning from Task Descriptions,2023-10-08,1071.0,0.058824
107,AutoPrompt: Eliciting Knowledge from Language ...,Eliciting Knowledge from Language Models Using...,2020-10-29,2020,215,AutoPrompt: Eliciting Knowledge from Language ...,2023-10-08,1074.0,0.200186
108,Language Models are Few-Shot Learners,Language Models are Few-Shot Learners,2020-05-28,2020,15624,Language Models are Few Shot Learners,2023-10-08,1228.0,12.723127


In [43]:
# Sort by citations per day, descending
semantic_scholar_df = semantic_scholar_df.sort_values(by=["citations_per_day"], ascending=False)

# Output to CSV
semantic_scholar_df.to_csv("semantic_scholar_df.csv", index=False)
