Skip to content

An AI-powered web scraper for Syria analysts built with Flask, React & Postgres.

License

Notifications You must be signed in to change notification settings

jclark1913/syria-daily-brief

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Syria Daily Brief

An AI-powered web scraper built for Syria analysts and observers.
Explore the docs »

· Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Screenshots
  3. Roadmap

About The Project

Tracking news and press releases is time-consuming, particularly when dealing with press releases and propaganda from the regime and non-state actors. As an analyst and perennial Syria-watcher, I want to build tools that ease this process - to streamline the gathering, translating and curating of data so more time can be spent analyzing trends and sentiments rather than monitoring social media feeds. Enter this project.

Syria Daily Brief is a web scraper designed to gather posts, articles and press announcements from a variety of Arabic-language sources run by governments, non-state actors and independent press. This project is designed to complement the work of Syria-focused analysts, particularly those who deal with large quantities of open-source data.

The project features a backend API written in Flask/Python and a frontend UI build in React/Javascript/TailwindCSS. It also features a PostgreSQL database with SQLAlchemy as an ORM and Marshmallow for json schema validation, as well as numerous other Python libraries such as Beautiful Soup, Pandas, OpenAI, etc to handle various features. While SDB is a standalone API in and of itself, you can view the frontend GUI here.

This project is currently in active development - check back soon for its first stable release.

(back to top)

Built With

Python React.js Flask ChatGPT Pandas

Additional libraries/tools: Marshmallow, SQLAlchemy, BeautifulSoup4

(back to top)

Screenshots

Roadmap

  • Collect data from a spectrum of Arabic-language websites:

    • Specify timespan for data collection (Last 24 hrs, last week, last 6 months, etc)
    • Gather data from a single source or cast a wide net to all available websites/outlets
    • Expand data collection to dozens of sources
  • Manage collected data:

    • Explore collected data through responsive UI
    • Search, tag and filter entries
    • Export data to .csv, .xlsx formats
  • Machine translations and summaries:

    • Utilize GPT-3.5/GPT-4 to summarize Arabic datasets
    • Get quick translations via ArgosTranslate
  • Use responsive UI to view, edit and manage data:

    • Save scraped data to personalized collections
    • View entries in sortable, editable database
    • Search data and tag entries of interest
    • Personalize data collection operations from frontend
    • Deploy project as offline, cross-platform Electron.js app

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Justin Clark - @JustinClarkJO - jclarksummit AT gmail DOT com

Project Link: https://github.com/jclark1913/syria-daily-brief

(back to top)

About

An AI-powered web scraper for Syria analysts built with Flask, React & Postgres.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages