# Web Crawling
Proses mengumpulkan data dari halaman web secara otomatis menggunakan program crawler/spider (seperti Googlebot). Hasilnya berupa salinan konten web untuk diindeks atau dianalisis.

## Langkah-langkah mengerjakan
Siapkan website untuk kalian crawling, dalam contoh kasus ini kita akan mengambil jurnal yang ada di website springer nature. Selanjutnya, buka website https://dev.springernature.com/ lalu buat akun untuk mendapat API yang kalian butuhkan (meta API), copy API paste ke kode yang telah kalian buat.

Setelah itu, ketikkan kata kunci yang ingin kalian tampilkan run kode lalu akan muncul total hasil kata kunci yang telah diketikkan sebelumnya. Dan, buat kode untuk mengkonversikan output dari kode kalian menjadi file csv.

In [1]:
pip install sprynger


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import requests
# Silahkan membuat api key dari https://dev.springernature.com/#api
api_key = "4b25841b8ccff165e4b251048a414037"
isbn = "978-3-031-63497-0"

url = "https://api.springernature.com/meta/v2/json"
params = {
    "q": f"jokowi",
    "api_key": api_key,
    "p": 10
}

response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(f"Total hasil: {data['result'][0]['total']}\n")
    for record in data['records']:
        doi = record.get('doi', 'N/A')
        title = record.get('title', 'No title')
        abstract = record.get('abstract', 'No abstract')
        print(f"DOI: {doi}")
        print(f"Title: {title}")
        print(f"Abstract: {abstract}\n")
else:
    print("Error:", response.status_code, response.text)

Total hasil: 900

DOI: 10.1007/s11196-025-10299-4
Title: The Language of Justice: Examining Courtroom Discourse in an Electoral Conflict
Abstract: Indonesia’s Presidential election in 2019 was a repeat contest between Joko Widodo (JM) as the incumbent, and Prabowo Subianto (PS) as the second-time contender. Once the manual counting of the votes was over, the General Election Committee declared that JM gained more than 55% of the votes; yet that count was challenged by PS. The issue was settled in the Constitutional Court of Indonesia. This study aims to discuss the courtroom dynamics of that dispute, using corpus-assisted methods to analyze a dataset consisting of all the official transcripts from the proceedings in the courts. The transcripts from all roles in the court (judges, lawyers, witnesses, and experts) were compiled as a corpus. The corpus was tokenized, annotated, indexed, and analyzed using LancsBox 6.0, a corpus query system that supports the Indonesian language, the langu

In [3]:
import requests
import csv

api_key = "4b25841b8ccff165e4b251048a414037"

url = "https://api.springernature.com/meta/v2/json"
params = {
    "q": "web mining",
    "api_key": api_key,
    "p": 10
}

response = requests.get(url, params=params, timeout=10)

if response.status_code == 200:
    data = response.json()

    # buka file CSV dan tulis header + rows
    with open("springer_results.csv", "w", newline="", encoding="utf-8") as csvfile:
        fieldnames = ["doi", "title", "abstract", "publicationName", "isbn", "url"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for record in data.get("records", []):
            writer.writerow({
                "doi": record.get("doi", "N/A"),
                "title": record.get("title", "No title"),
                "abstract": record.get("abstract", "No abstract"),
                "publicationName": record.get("publicationName", ""),
                "isbn": record.get("isbn", ""),
                "url": record.get("url", "")
            })

    print("Sukses: hasil disimpan di springer_results.csv")
else:
    print("Error:", response.status_code, response.text)


Sukses: hasil disimpan di springer_results.csv


## File csv otomatis tersimpan ke lokal