# Web Crawling
Proses mengumpulkan data dari halaman web secara otomatis menggunakan program crawler/spider (seperti Googlebot). Hasilnya berupa salinan konten web untuk diindeks atau dianalisis.

## Langkah-langkah mengerjakan
Siapkan website untuk kalian crawling, dalam contoh kasus ini kita akan mengambil jurnal yang ada di website springer nature. Selanjutnya, buka website https://dev.springernature.com/ lalu buat akun untuk mendapat API yang kalian butuhkan (meta API), copy API paste ke kode yang telah kalian buat.

Setelah itu, ketikkan kata kunci yang ingin kalian tampilkan run kode lalu akan muncul total hasil kata kunci yang telah diketikkan sebelumnya. Dan, buat kode untuk mengkonversikan output dari kode kalian menjadi file csv.

In [1]:
pip install sprynger

Collecting sprynger
  Downloading sprynger-0.4.1-py3-none-any.whl.metadata (5.8 kB)
Collecting lxml (from sprynger)
  Downloading lxml-6.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Downloading sprynger-0.4.1-py3-none-any.whl (40 kB)
Downloading lxml-6.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (5.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: lxml, sprynger
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [sprynger]1/2[0m [sprynger]
[1A[2KSuccessfully installed lxml-6.0.1 sprynger-0.4.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel t

In [2]:
import requests
# Silahkan membuat api key dari https://dev.springernature.com/#api
api_key = "4b25841b8ccff165e4b251048a414037"
isbn = "978-3-031-63497-0"

url = "https://api.springernature.com/meta/v2/json"
params = {
    "q": f"web mining,web usage mining",
    "api_key": api_key,
    "p": 10
}

response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(f"Total hasil: {data['result'][0]['total']}\n")
    for record in data['records']:
        doi = record.get('doi', 'N/A')
        title = record.get('title', 'No title')
        abstract = record.get('abstract', 'No abstract')
        print(f"DOI: {doi}")
        print(f"Title: {title}")
        print(f"Abstract: {abstract}\n")
else:
    print("Error:", response.status_code, response.text)

Total hasil: 1275

DOI: 10.1007/s41060-023-00483-9
Title: Artificial intelligence trend analysis in German business and politics: a web mining approach
Abstract: Current research on trend detection in artificial intelligence (AI) mainly concerns academic data sources and industrial applications of AI. However, we argue that industrial trends are influenced by public perception and political decisions (e.g., through industry subsidies and grants) and should be reflected in political data sources. To investigate this hypothesis, we examine the AI trend development in German business and politics from 1998 to 2020. Therefore, we propose a web mining approach to collect a novel data set consisting of business and political data sources combining 1.07 million articles and documents. We identify 246 AI-related buzzwords extracted from various glossaries. We use them to conduct an extensive trend detection and analysis study on the collected data using machine learning-based approaches. This 

In [4]:
import requests
import csv

api_key = "4b25841b8ccff165e4b251048a414037"

url = "https://api.springernature.com/meta/v2/json"
params = {
    "q": "Jokowi",
    "api_key": api_key,
    "p": 10
}

response = requests.get(url, params=params, timeout=10)

if response.status_code == 200:
    data = response.json()

    # buka file CSV dan tulis header + rows
    with open("springer_results.csv", "w", newline="", encoding="utf-8") as csvfile:
        fieldnames = ["doi", "title", "abstract", "publicationName", "isbn", "url"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for record in data.get("records", []):
            writer.writerow({
                "doi": record.get("doi", "N/A"),
                "title": record.get("title", "No title"),
                "abstract": record.get("abstract", "No abstract"),
                "publicationName": record.get("publicationName", ""),
                "isbn": record.get("isbn", ""),
                "url": record.get("url", "")
            })

    print("Sukses: hasil disimpan di springer_results.csv")
else:
    print("Error:", response.status_code, response.text)


Sukses: hasil disimpan di springer_results.csv


## File csv otomatis tersimpan ke lokal