# Find meta-analysis of MVTec AD through Semantic Scholar's API

I will attempt to do this with the following strategy:

1. get the two mvtec ad papers (2019 and 2021)
2. list all their citations
3. for each paper citing (1):
   1. get all their citations
   2. compute how many they have in (2)

I expected that if a paper cites many papers that cite mvtec ad to be a review, a survey, or a meta analysis.

[Semantic Scholar's API web page](https://www.semanticscholar.org/product/api).

[[DOC] `/graph/v1/paper/{paper_id}` ](https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/get_graph_get_paper)

OBS: limited to 100 requests / 5 minutes.

# `000` --> `001` changes

I will just do the same and include info about the authors and publication date to facilitate manual inspection.

In [1]:
from urllib.request import urlopen
import json

# i found this solution to limit the number of requests to semanticscholar.org
# src: https://stackoverflow.com/a/64845203/9582881
from ratelimit import limits, sleep_and_retry


@sleep_and_retry
@limits(calls=100, period=5 * 60)  # 100 requests per 5 minutes
def get_paper_data(paper_id: str) -> dict:
    url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}?fields=title,citations,references,authors,publicationDate,abstract,url"
    with urlopen(url) as response:
        return json.loads(response.read().decode())

In [2]:
# https://www.semanticscholar.org/paper/MVTec-AD-%E2%80%94-A-Comprehensive-Real-World-Dataset-for-Bergmann-Fauser/3aa681914a7da79f7d7293f51a058eefe61c8bb7
PAPERID_MVTEC_2019 = "3aa681914a7da79f7d7293f51a058eefe61c8bb7"

# https://www.semanticscholar.org/paper/The-MVTec-Anomaly-Detection-Dataset%3A-A-Real-World-Bergmann-Batzner/48f9a48aa5b1230b05a443d2d531e6441a541686
PAPERID_MVTEC_2021 = "48f9a48aa5b1230b05a443d2d531e6441a541686"

mvtec_papers_ids = (PAPERID_MVTEC_2019, PAPERID_MVTEC_2021)

In [3]:
# calls to the api!
mvtec_papers_data = {paper_id: get_paper_data(paper_id) for paper_id in mvtec_papers_ids}

In [4]:
papers_citing_mvtec_ids = {
    paper_data['paperId']
    for mvtec_paper_data in mvtec_papers_data.values()
    for paper_data in mvtec_paper_data["citations"]
}
print(f"{len(papers_citing_mvtec_ids)=}")

len(papers_citing_mvtec_ids)=293


In [5]:
from progressbar import progressbar

# calls to the api!
papers_citing_mvtec_data = {
    paper_id: get_paper_data(paper_id)
    for paper_id in progressbar(papers_citing_mvtec_ids, max_value=len(papers_citing_mvtec_ids))
}
print(f"{len(papers_citing_mvtec_data)=}")

100% (293 of 293) |######################| Elapsed Time: 0:10:44 Time:  0:10:44


len(papers_citing_mvtec_data)=293


In [12]:
# count how many refrences each paper has in common with the mvtec papers
papers_citing_mvtec_records =  {
    paper_id: {
        # things that I just copy from the paper data
        **{k: paper_data[k] for k in ("paperId", "title", "publicationDate", "abstract", "url")},
        "refences_ids": (refences_ids := {ref["paperId"] for ref in paper_data["references"]}),
        "nrefences": len(refences_ids),
        "references_ids_incommon_mvtec_citations": refences_ids.intersection(papers_citing_mvtec_ids),
        "nreferences_incommon_mvtec_citations": len(refences_ids.intersection(papers_citing_mvtec_ids)),
        "authors": (authors := [author["name"] for author in paper_data["authors"]]),
        "authors_first": authors[0] if authors else "NOT FOUND",
        "authors_list": ", ".join(authors),
    }
    for paper_id, paper_data in papers_citing_mvtec_data.items()
}

from pathlib import Path
DATA_DIR = Path(".") / "data"
DATA_DIR.mkdir(exist_ok=True)

import pandas as pd
df = pd.DataFrame.from_records(list(papers_citing_mvtec_records.values())).set_index("paperId")
df["percent_references_incommon_mvtec_citations"] = 100 * df["nreferences_incommon_mvtec_citations"] / df["nrefences"]
df.to_csv(DATA_DIR / "001_papers_citing_mvtec_records.csv", index=True)

In [14]:
VIZ_COLUMNS = [
    "title", "nrefences", "nreferences_incommon_mvtec_citations", "percent_references_incommon_mvtec_citations",
    "url", "authors_first", "authors_list", "publicationDate", "abstract",
]

selection00 = df.sort_values("nreferences_incommon_mvtec_citations", ascending=False).head(15)
selection01 = df.sort_values("percent_references_incommon_mvtec_citations", ascending=False).head(15)

selection = pd.concat([selection00, selection01], axis=0)[VIZ_COLUMNS].drop_duplicates()
selection.to_csv(DATA_DIR / "001_selection_likely_meta_analyses.csv", index=True)

In [15]:
selection.shape

(26, 9)

In [16]:
selection

Unnamed: 0_level_0,title,nrefences,nreferences_incommon_mvtec_citations,percent_references_incommon_mvtec_citations,url,authors_first,authors_list,publicationDate,abstract
paperId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
51ba3b33f445199d9f3cddb5b00c7e2927199b0c,Deep Learning for Unsupervised Anomaly Localiz...,125,61,48.8,https://www.semanticscholar.org/paper/51ba3b33...,Xian Tao,"Xian Tao, Xinyi Gong, X. Zhang, Shaohua Yan, C...",2022-07-21," Abstract — Currently, deep learning-based vi..."
73cd41517018945fc17300adeb91447bfede6a53,A Survey on Unsupervised Industrial Anomaly De...,125,33,26.4,https://www.semanticscholar.org/paper/73cd4151...,Yajie Cui,"Yajie Cui, Zhaoxiang Liu, Shiguo Lian",2022-04-24,"In line with the development of Industry 4.0, ..."
f98dbe64ed6fa8925048291fcceb625d704fb294,Self-Supervised Anomaly Detection: A Survey an...,136,26,19.117647,https://www.semanticscholar.org/paper/f98dbe64...,H. Hojjati,"H. Hojjati, Thi Kieu Khanh Ho, N. Armanfard",2022-05-10,"—Over the past few years, anomaly detection, a..."
1e78fb6e1c95e0831f6e9f20186534f3bf20db4e,A Unified Model for Multi-class Anomaly Detection,50,21,42.0,https://www.semanticscholar.org/paper/1e78fb6e...,Zhiyuan You,"Zhiyuan You, Lei Cui, Yujun Shen, K. Yang, Xin...",2022-06-08,Despite the rapid advance of unsupervised anom...
32aab0c5468d9692877c53efc5612957e48205b2,Data refinement for fully unsupervised visual ...,51,21,41.176471,https://www.semanticscholar.org/paper/32aab0c5...,Antoine Cordier,"Antoine Cordier, Benjamin Missaoui, Pierre Gut...",2022-02-25,Anomaly detection has recently seen great prog...
76077e1e12908b525907e7c3419368291f5965ab,Benchmarking Unsupervised Anomaly Detection an...,38,19,50.0,https://www.semanticscholar.org/paper/76077e1e...,Ye Zheng,"Ye Zheng, Xiang Wang, Yu-Hang Qi, Wei Li, Liwe...",2022-05-30,Unsupervised anomaly detection and localizatio...
0f23a44418aabe3344c6f3809d6a8ab898292813,Data Invariants to Understand Unsupervised Out...,83,18,21.686747,https://www.semanticscholar.org/paper/0f23a444...,Lars Doorenbos,"Lars Doorenbos, R. Sznitman, Pablo M'arquez-Neila",2021-11-26,. Unsupervised out-of-distribution (U-OOD) det...
08b66b00a9ce6ab8090668f9f848c5659617bae9,Surface Defect Detection Methods for Industria...,139,17,12.230216,https://www.semanticscholar.org/paper/08b66b00...,Yajun Chen,"Yajun Chen, Yuanyuan Ding, Fan Zhao, E. Zhang,...",2021-08-20,The comprehensive intelligent development of t...
0795b7462b6d186d4ccd63d185a25f54d56aaf5f,Explicit Boundary Guided Semi-Push-Pull Contra...,47,16,34.042553,https://www.semanticscholar.org/paper/0795b746...,Xincheng Yao,"Xincheng Yao, Chongyang Zhang, Ruoqing Li",2022-07-04,Most of anomaly detection algorithms are mainl...
3b3aefbbdb64e5812f133f220b3f129a36a30065,Anomaly Detection via Reverse Distillation fro...,49,16,32.653061,https://www.semanticscholar.org/paper/3b3aefbb...,Hanqiu Deng,"Hanqiu Deng, Xingyu Li",2022-01-26,Knowledge distillation (KD) achieves promising...


In [22]:
for rowid, row in selection.iterrows():
    print(f"{row['authors_first']} ({row['publicationDate'].split('-')[0] if row['publicationDate'] else 'NOT FOUND'})")
    print(f"paperId={rowid}")
    print(f"{row['title']}")
    print(f"{row['authors_list']}")
    print(f"{row['nrefences']=}")
    print(f"{row['nreferences_incommon_mvtec_citations']=}")
    print(f"{row['percent_references_incommon_mvtec_citations']=:.0f}%")
    print(f"{row['url']}")
    print()
    print(f"{row['abstract']}")
    print()
    print()

Xian Tao (2022)
paperId=51ba3b33f445199d9f3cddb5b00c7e2927199b0c
Deep Learning for Unsupervised Anomaly Localization in Industrial Images: A Survey
Xian Tao, Xinyi Gong, X. Zhang, Shaohua Yan, Chandranath Adak
row['nrefences']=125
row['nreferences_incommon_mvtec_citations']=61
row['percent_references_incommon_mvtec_citations']=49%
https://www.semanticscholar.org/paper/51ba3b33f445199d9f3cddb5b00c7e2927199b0c

 Abstract — Currently, deep learning-based visual inspection has been highly successful with the help of supervised learning methods. However, in real industrial scenarios, the scarcity of defect samples, the cost of annotation, and the lack of a priori knowledge of defects may render supervised-based methods ineffective. In recent years, unsupervised anomaly localization algorithms have become more widely used in industrial inspection tasks. This paper aims to help researchers in this field by comprehensively surveying recent achievements in unsupervised anomaly localization in 

[output of the cell above here](https://docs.google.com/document/d/1ZELJgDTFbz91VtFT0jm30Druv6803mM1xRSovu2SZFc/edit?usp=sharing)
