Skip to content

GSoC 2023 ‐ Expanding Support for GCP Resources with Modularized Crawler

Sudipto Baral edited this page Aug 23, 2023 · 7 revisions

Expanding Support for GCP Resources with Modularized Crawler

This document describes the Google Summer of Code 2023 project titled "Expanding Support for GCP Resources with Modularized Crawler."

Introduction

By leveraging Google Cloud APIs, the GCP Scanner has the capacity to query a range of GCP resources. However, the current version only supports scanning 13 resources, despite Google Cloud offering over 100 different products. To address this limitation, the project aims to enhance the scanning capabilities by incorporating additional GCP resources, thereby maximizing its utility.

Furthermore, the existing structure consolidates the crawlers into a single file, potentially posing challenges in scalability and maintainability as the roster of crawlers expands. Consequently, a refactoring effort is indispensable to modularize the crawler, and this includes adapting its unit tests to ensure its enduring maintainability.

Refactoring the Crawler

In the previous iteration, the crawl.py file encompassed all the logic necessary for data crawling from the GCP APIs. However, this project introduces a new crawl module, fundamentally transforming the approach. The introduction of this module enables the modularization of the existing crawler, significantly enhancing its maintainability. Furthermore, this refinement simplifies the process of implementing new crawlers. If you're intrigued by the intricacies of the implementation, feel free to explore Epic: Refactor the Crawler for modularity and better maintainability for more in-depth details.

My contributions

From the outset, I took a proactive approach by organizing tasks early on. I encountered a situation where the concept of code refactoring overlapped with the project of another GSoC contributor. To maintain a clear structure, I took the initiative to outline the plan and detailed all the corresponding subtasks. This proactive approach allowed me to work collaboratively with fellow participants, successfully addressing each subtask. Throughout the process, I consistently engaged in peer reviews and actively shared my insights.

Accepted PRs

Furthermore, during the process of refactoring the crawler, it became apparent that we could enhance the scanner.py by eliminating repetitive if-else statements. Issue link here.

Add support for additional GCP resources

A strategy to incorporate additional crawlers has been outlined. Subsequently, I added support for the following crawlers crawlers. If you are interested in adding new crawlers refer to Epic: add support for additional GCP resources.