- Name - Sandagomi Pieris
- Email - npieris73@gmail.com
- GitHub Profile - https://github.com/sandagomipieris
The CrawlerX is a platform that we can use for crawling web URLs in different kinds of protocols in a distributed way. Web crawling often called web scraping is a method of programmatically going over a collection of web pages and extracting data which is useful for data analysis with web-based data. With a web scraper, you can mine data about a set of products, get a large corpus of text or quantitative data to play around with, get data from a site without an official API, or just satisfy your own personal curiosity. CrawlerX was a platform designed to run on a VM-based environment with limited functionality. This project extends that limitation to the containerized environments.
- Pod Deployment
- Service Deployment
- Ingress Deployment
- ConfigMap Deployment
This project mainly focuses on deploying the CrawlerX web platform on Kubernetes. As per the details provided by SCoRe organization mentors, CrawlerX needs to be deployed as an on-demand platform. Therefore, as per the investigations Helm is used to implementing the requirement. Helm helps you manage Kubernetes applications as Charts. Charts are easy to create, version, share, and publish also unpublish. Now users can deploy the CrawlerX on the K8s environment with a single command as follows.
helm install <RELEASE_NAME> <HELM_HOME> --namespace <NAMESPACE> --dependency-update --create-namespace
- Add Helm chart for CrawlerX platform
- Add K8s artifacts for VueJS based frontend server deployment
- Add K8s artifacts for Django backend server deployment
- Add K8s artifacts for Celery beat deployment
- Add K8s artifacts for Celery worker deployment
- Add K8s artifacts for Scrapy crawler deployment
- Add MongoDB, RabbitMQ and Elasticsearch deployments as chart dependencies
- Add K8s secret artifacts to pull private images for the pods
- Add ConfigMaps for each deployment
- Configure values.yaml to customize deployment parameters
- Documentation of the K8s deployment
- Testing on Minikube local server
- Testing on Google Kubernetes Engine
- Integrate Grafana dashboard