Skip to content

In this project, a postgres cloudsql is set up and connected to a docker pgadmin container. Scraped data is written into the database, and accessed via local machine and remote host

Notifications You must be signed in to change notification settings

paulonye/DockerXPostgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebData->ComputeEngine->PostgresDB

In this project, an Extract, Transform, and Load Pipeline application alongside all its dependencies is packaged and contanarized into a dokcer container.

The project focuses on deploying the scraper application built in Project one on Google Cloud Run, a mangaged compute platform that allows you run stateless containers.

The Project is the 5th series in my "Building your first Google Cloud Analytics Project"

The Project is also a direct sequel to the Third Project

Link to the Yahoo Finance Website

Link to the Medium Article

Project Structure

  • Introduction
  • Setting up the Environment
  • Setting up PGADMIN
  • Building the Docker Image
  • Deploying the Container on Cloud Run
  • Conclusion

Cover_Image

Set up your environment

Clone the Github Repo

   git clone https://github.com/paulonye/DockerXPostgres

Install the Required Libraries

   pip install -r requirements.txt

Set up the .env file

   PGUSER=user
   PGPASS=******
   HOST=**********
   DB=database_name
   key_file=key.json

Migrate Google Sheets Data to PosgresDB

python app/batch.py

Build the Docker Image

cd into the directory of the cloned repo, open the Dockerfile and make the changes you need to make. It is well documented, so just follow through.

Some changes to watch out for:

  • Directory and Name of the service account.
  • Name of the environment variable for the service account.

Once this is done; you can then build the docker image using

    docker build -t image_name .

The above command builds the docker image; to test that it works, use:

   docker run image_name

Once you are sure that it works, go ahead and set up the artifact registry as described in the medium article above.

Pushing the Docker Container Image to Artifact registry

Authenticate to the Region where your Artifact Registry is located

   gcloud auth configure-docker us-central1-docker.pkg.dev

Build the Docker Image for Artifact Registry

   docker build -t us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag1 .

Where my-project is your GCP Project ID and my-repo is the name of the repo you created on artifact registry. Push the Docker Image to Artifact Registry

   docker push us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag1

Deploy the Container on Cloud Run

   gcloud beta run jobs create job-quickstart --image image_name:tag --region us-central1

Schedule a Cron job using Google Cloud Scheduler

About

In this project, a postgres cloudsql is set up and connected to a docker pgadmin container. Scraped data is written into the database, and accessed via local machine and remote host

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published