Skip to content

jaxluthor/web_scrap_python

Repository files navigation

Content Monitoring Dashboard (Django Version)

A web-based monitoring application that crawls and scrapes content from trusted URLs, searches for keywords, and displays results in a beautiful monitoring dashboard. Now powered by Django for better scalability and robustness.

Features

  • 🔍 Keyword Monitoring: Search for keywords across all trusted sources
  • 🌐 URL Management: Add, view, and delete trusted sources
  • 📊 Statistics Dashboard: Track your monitoring activity
  • 🎯 Match Percentage: See relevance scores for each result
  • 💾 Database Storage: Unified SQLite database using Django ORM
  • 🎨 Modern UI: Beautiful dark-themed dashboard with smooth animations
  • 🛡️ Admin Interface: Built-in Django admin for data management

Tech Stack

  • Backend: Python 3.x, Django 5.x
  • Frontend: HTML5, CSS3, JavaScript (Vanilla)
  • Database: SQLite (via Django ORM)
  • Scraping: BeautifulSoup4, Requests
  • CORS: django-cors-headers

Installation

1. Activate Virtual Environment

source env/bin/activate

2. Install Dependencies

pip install -r requirements.txt

3. Run Migrations

python3 manage.py migrate

4. Create Admin User (Optional)

python3 manage.py createsuperuser

(Follow prompts to create an admin account)

Usage

1. Start the Server

python3 manage.py runserver 3000

The server will start on http://localhost:3000

2. Add Trusted URLs

  • Navigate to the URLs tab in the dashboard
  • Add your trusted sources (websites you want to monitor)
  • Each URL should be a valid web page

3. Search for Keywords

  • Go to the Dashboard tab
  • Enter keywords you want to search for
  • Click Scan Now
  • View results with match percentages and snippets

4. Django Admin Interface

  • Visit http://localhost:3000/admin
  • Log in with your superuser credentials
  • Manage URLs, Scraped Content, and Search Results directly

Project Structure

scraper/
├── scraper_project/       # Django project configuration
├── scraper_app/           # Main application logic (Models, Views, Admin)
├── requirements.txt       # Python dependencies
├── script/
│   ├── scraper.py        # Web scraping and keyword search logic
│   └── filter.py         # Content filtering (future use)
├── frontend/
│   ├── index.html        # Main dashboard HTML (Django template)
│   ├── index.css         # Styles (Static file)
│   └── script.js         # Frontend JavaScript (Static file)
├── manage.py             # Django management script
└── db.sqlite3            # Unified SQLite database

API Endpoints

All API endpoints are located under the root path and follow the same structure as the previous version:

  • POST /api/search - Trigger keyword search
  • GET /api/articles - Get search results
  • GET /api/urls - Manage trusted URLs
  • GET /api/statistics - Get monitoring statistics

How It Works

  1. URL Storage: Trusted URLs are stored in the Django database using the TrustedURL model.
  2. Scraping: When you search for a keyword, the system:
    • Fetches content from URLs defined in the database.
    • Parses HTML structure (headings, paragraphs).
    • Searches for keyword matches.
  3. Scoring: Match percentage is calculated based on keyword prominence and frequency.
  4. Storage: Scraped content and search results are stored in the database for future reference.
  5. Display: Results are shown in the dashboard with snippets and links.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors