Skip to content

subinyy/IS_NetShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ IS_NetShield

μ§€λŠ₯ν˜• μœ ν•΄ URL μ‹€μ‹œκ°„ 탐지 μ‹œμŠ€ν…œ

μ•…μ„± URL을 λ¨Έμ‹ λŸ¬λ‹μœΌλ‘œ μ‹€μ‹œκ°„ λΆ„μ„Β·μ°¨λ‹¨ν•˜λŠ” λ³΄μ•ˆ μ‹œμŠ€ν…œ


πŸ”„ μ‹œμŠ€ν…œ νŒŒμ΄ν”„λΌμΈ

flowchart LR
    A([🌐 URL μž…λ ₯]) --> B[/ν”Όμ²˜ μΆ”μΆœ\nfeature_engineering.py/]
    
    B --> C1[URL 길이 계산]
    B --> C2[특수문자 뢄석]
    B --> C3[도메인 뢄석]
    B --> C4[ν‚€μ›Œλ“œ 탐지]

    C1 & C2 & C3 & C4 --> D[(25개 ν”Όμ²˜ 벑터)]

    D --> E[πŸ€– XGBoost λͺ¨λΈ\ntrain_model.py]

    E --> F{예츑 결과}
    F -->|βœ… 정상| G[allow]
    F -->|⚠️ 경고| H[alert]
    F -->|🚫 μœ„ν—˜| I[block]

    E --> K[πŸ“Š μ„±λŠ₯ 평가\nmodel_evaluation.png]

    style A fill:#4CAF50,color:#fff
    style E fill:#FF6600,color:#fff
    style F fill:#2196F3,color:#fff
    style G fill:#4CAF50,color:#fff
    style H fill:#FF9800,color:#fff
    style I fill:#F44336,color:#fff
Loading

πŸ—‚οΈ ν”„λ‘œμ νŠΈ ꡬ쑰

IS_NetShield/
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ πŸ”§ feature_engineering.py   # URL β†’ ν”Όμ²˜ μΆ”μΆœ
β”‚   β”œβ”€β”€ πŸ€– train_model.py           # XGBoost ν•™μŠ΅ 및 평가
β”‚   └── πŸš€ api_server.py            # FastAPI μ„œλ²„ (μ˜ˆμ •)
β”œβ”€β”€ πŸ“ data_mal/
β”‚   β”œβ”€β”€ πŸ“„ malicious_phish.csv      # 정상/μ•…μ„± ν˜Όν•© URL 데이터셋 (Kaggle)
β”‚   └── πŸ“„ online-valid.csv         # μ‹€μ‹œκ°„ ν”Όμ‹± URL (PhishTank, 비ꡐ κ²€μ¦μš©)
β”œβ”€β”€ πŸ“ model/
β”‚   └── πŸ’Ύ xgb_model.pkl            # ν•™μŠ΅λœ λͺ¨λΈ
β”œβ”€β”€ πŸ“ results/
β”‚   └── πŸ“Š model_evaluation.png     # λͺ¨λΈ 평가 κ²°κ³Ό μ‹œκ°ν™”
β”œβ”€β”€ 🚫 .gitignore
└── πŸ“– README.md

πŸš€ μ‹œμž‘ν•˜κΈ°

1️⃣ νŒ¨ν‚€μ§€ μ„€μΉ˜

pip install xgboost scikit-learn pandas numpy matplotlib seaborn requests

2️⃣ 데이터셋 λ‹€μš΄λ‘œλ“œ

[1] Kaggle - malicious_phish.csv:
πŸ”— https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset

컬럼: url, type (benign / phishing / malware / defacement)

[2] PhishTank (Cisco Talos) β€” CC BY-SA 2.5

3️⃣ λͺ¨λΈ ν•™μŠ΅

python train_model.py

πŸ“Š λͺ¨λΈ μ„±λŠ₯ κ²°κ³Ό

model_evaluation

ν•™μŠ΅ 데이터: 651,191개 (Kaggle Malicious URL Dataset)
ν…ŒμŠ€νŠΈ 데이터: 130,239개 (μ „μ²΄μ˜ 20%)

μ§€ν‘œ 수치
Accuracy 0.9408
Precision 0.9006
Recall 0.9296
F1 Score 0.9149
ROC-AUC 0.9860
False Positive Rate 0.0534

Confusion Matrix 해석

예츑 정상 예츑 μ•…μ„±
μ‹€μ œ 정상 81,045 βœ… 4,576 ❌
μ‹€μ œ μ•…μ„± 3,140 ❌ 41,478 βœ…
  • μ˜€νƒ (정상 β†’ μ•…μ„±): 4,576건 (5.3%)
  • 미탐 (μ•…μ„± β†’ 정상): 3,140건 (7.0%)

Feature Importance Top 5

μˆœμœ„ ν”Όμ²˜ 의미
1 domain_length μ•…μ„± URL은 도메인이 κΈΈλ‹€
2 has_www www 없이 μ΄μƒν•œ μ„œλΈŒλ„λ©”μΈ μ‚¬μš©
3 subdomain_depth μ„œλΈŒλ„λ©”μΈμ΄ κΉŠμ„μˆ˜λ‘ μ˜μ‹¬
4 path_depth κ²½λ‘œκ°€ λ³΅μž‘ν• μˆ˜λ‘ μ˜μ‹¬
5 tld_risk .tk .xyz λ“± κ³ μœ„ν—˜ TLD μ‚¬μš©

πŸ§ͺ μΆ”μΆœ ν”Όμ²˜ λͺ©λ‘ (25개)

μΉ΄ν…Œκ³ λ¦¬ ν”Όμ²˜
πŸ“ URL 길이 url_length, domain_length, path_length, query_length
πŸ”£ 특수문자 count_dots, count_hyphens, count_at, count_percent λ“±
🌍 도메인 subdomain_depth, has_ip_address, tld_risk
πŸ”’ ν”„λ‘œν† μ½œ is_https
πŸ” ν‚€μ›Œλ“œ has_phishing_keyword, has_brand_keyword
🧩 νŒ¨ν„΄ has_typosquatting, has_double_slash
πŸ“ 톡계 url_entropy, digit_ratio, path_depth

βš–οΈ 비ꡐ λŒ€μƒ (Optioanl/μ˜ˆμ •)

λͺ¨λΈ νŠΉμ§• μœ ν˜•
πŸ₯‡ 우리 λͺ¨λΈ XGBoost + 25개 ν”Όμ²˜ μ—”μ§€λ‹ˆμ–΄λ§, 둜컬 μΆ”λ‘  Local ML
πŸ”΅ Google Safe Browsing 업계 ν‘œμ€€, 무료 API Cloud API
🟠 VirusTotal 70개 μ—”μ§„ 앙상블, μ •λ‹΅μ§€λ‘œ ν™œμš© Cloud API

πŸ”Œ API μ„œλ²„ (μ˜ˆμ •)

FastAPI 기반 REST API μ„œλ²„λ₯Ό κ΅¬μΆ•ν•˜μ—¬ μ‹€μ‹œκ°„ URL 탐지 μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•  μ˜ˆμ •μž…λ‹ˆλ‹€.

# νŒ¨ν‚€μ§€ μ„€μΉ˜
pip install fastapi uvicorn

# μ„œλ²„ μ‹€ν–‰
uvicorn api_server:app --reload --host 0.0.0.0 --port 8000

μ˜ˆμ • μ—”λ“œν¬μΈνŠΈ

λ©”μ„œλ“œ 경둜 μ„€λͺ…
POST /analyze 단일 URL 뢄석
POST /analyze/batch λ‹€μˆ˜ URL 일괄 뢄석 (μ΅œλŒ€ 100개)
GET /health μ„œλ²„ μƒνƒœ 확인
GET /stats λͺ¨λΈ 정보 쑰회

응닡 μ˜ˆμ‹œ

{
  "url": "http://paypa1-secure.xyz/login/verify",
  "score": 99,
  "verdict": "block",
  "label": "μœ„ν—˜",
  "reasons": ["ν”Όμ‹± ν‚€μ›Œλ“œ 포함", "κ³ μœ„ν—˜ TLD 도메인", "HTTP λΉ„μ•”ν˜Έν™”"],
  "response_time_ms": 12.4,
  "timestamp": "2026-04-03T18:00:00"
}

☁️ AWS 배포 μ•„ν‚€ν…μ²˜ (μ˜ˆμ •)

EC2 + ALB + WAF μ‘°ν•©μœΌλ‘œ μ‹€μ œ λ³΄μ•ˆ 경계λ₯Ό ꡬ성할 μ˜ˆμ •μž…λ‹ˆλ‹€.

μ‚¬μš©μž / 가상 곡격자
        ↓
  Route 53 (DNS)
        ↓
  ALB (λ‘œλ“œ λ°ΈλŸ°μ„œ)
        ↓
  AWS WAF (1μ°¨ λ£° 기반 차단)
        ↓
  EC2 탐지 μ—”μ§„ (FastAPI + XGBoost)
        ↓
  S3 (둜그 μ €μž₯) + CloudWatch (λͺ¨λ‹ˆν„°λ§)

μ˜ˆμ • ꡬ성 μš”μ†Œ

μ„œλΉ„μŠ€ μ—­ν• 
EC2 FastAPI μ„œλ²„ + XGBoost λͺ¨λΈ ν˜ΈμŠ€νŒ…
ALB νŠΈλž˜ν”½ λΆ„μ‚° 및 HTTPS 처리
AWS WAF IP 차단, μ•Œλ €μ§„ μ•…μ„± νŒ¨ν„΄ 1μ°¨ 필터링
S3 탐지 둜그 및 λͺ¨λΈ μ•„ν‹°νŒ©νŠΈ μ €μž₯
CloudWatch μ‹€μ‹œκ°„ λͺ¨λ‹ˆν„°λ§ 및 μ•ŒλžŒ

πŸ—ΊοΈ 개발 λ‘œλ“œλ§΅

βœ… 1단계  ML λͺ¨λΈ ν•™μŠ΅ 및 평가     β€” μ™„λ£Œ
πŸ”„ 2단계  FastAPI μ„œλ²„ ꡬ좕        β€” μ§„ν–‰ μ˜ˆμ •
⏳ 3단계  AWS 배포 (EC2+ALB+WAF)  β€” μ§„ν–‰ μ˜ˆμ •
⏳ 4단계  React λŒ€μ‹œλ³΄λ“œ UI        β€” μ§„ν–‰ μ˜ˆμ •

πŸ” λ³΄μ•ˆ ν”„λ‘œμ νŠΈ | Information Security Class

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors