Skip to content

mustafaelghrib/scraple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraple

A web scrapper and deposit system data pipeline!

Features:

  • Scrape data from urls, html file and xml files
  • Let users deposit their data through a deposit system
  • Data pipeline for extracting, cleaning and storing data in database

Tech Stacks

  • Backend: Python, Django, PostgreSQL
  • Infrastructure: Terraform, Google Cloud Compute Instance
  • Deployment: Nginx through Linux Bash Script

Backend:

Run The Backend

chmod +x ./scripts/run_backend.sh && ./scripts/run_backend.sh

Run The Tests:

  • Run Pytest:
    .venv/bin/pytest -rP
  • Run Pytest Coverage:
    .venv/bin/pytest --cov=backend

Docs:

  • Check Docs Coverage:
    .venv/bin/interrogate -v backend
  • Check Docs Style:
    .venv/bin/pydocstyle backend
  • Show Docs Locally:
    .venv/bin/mkdocs serve --dev-addr 127.0.0.1:9000
  • Deploy Docs to GitHub Pages:
    .venv/bin/mkdocs gh-deploy

Infrastructure:

Setup Terraform Backend and Secrets:

  • Create GCP project and get the project id
  • Create a GCP storage and get the bucket name
  • Download a service key file and rename it to infrastructure/.gcp_creds.json
  • Copy infrastructure/.backend.hcl.sample and rename it to infrastructure/.backend.hcl
  • Copy infrastructure/.secrets.auto.tfvars.sample and rename it to infrastructure/.secrets.auto.tfvars

Setup SSH:

  • Generate an SSH Key.
  • Create the folder infrastructure/.ssh and copy id_rsa.pub and id_rsa inside it

Run Terraform Commands:

  • Create an alias for terraform command
    alias TF=docker compose -f infrastructure/.docker-compose.yml run --rm terraform
  • terraform init
    TF init -backend-config=.backend.hcl
  • terraform apply gcp
    TF apply -target="module.gcp" --auto-approve
  • terraform destroy gcp
    TF destroy -target="module.gcp" --auto-approve
  • terraform output gcp
    TF output gcp