This document outlines the process, approach, and execution of the assigned task for the Data Source Analyst position. The objective is to showcase skills in API research, data extraction, and technical documentation.
Repository: [https://github.com/maep13/Data-Source-API-Analyst-Test]
The primary goal is to simulate a client requirement involving the extraction of specific data from GitHub's public API. The project assesses the following competencies:
- API Research: Ability to navigate technical documentation and understand endpoint logic, authentication, pagination, and limits.
- Data Extraction: Use of technical tools to make API calls and retrieve the requested data.
- Documentation & Troubleshooting: Capability to clearly document the process and anticipate potential issues with corresponding solutions.
This task uses Google Colab with Python as the primary tool for data extraction, based on these points:
- Power and Flexibility: Compared to GUI tools like Postman, Python provides full control over extraction logic, complex error handling, pagination, and data transformation.
- Best Practices: It enables secure coding practices, such as managing authentication tokens through "Secrets" to prevent exposing sensitive data.
- Task Requirement: The task description itself states that using Google Colab grants bonus points, demonstrating stronger technical proficiency.
The final structure of the repository includes:
/Content/TROUBLESHOOTING.md: A guide detailing common API errors and their resolutions./Content/DATA_CLEANING.md: A document explaining the approach for cleaning and processing extracted data./Postman_Collection/github_api_extraction.ipynb: The Google Colab notebook with the final Python script used for data extraction.README.md: This central document guiding the entire project.
This task has been an excellent practical exercise that realistically simulates the lifecycle of a data extraction requirement. Investigating an API’s documentation, structuring modular code, and documenting both the workflow and potential contingencies reinforces the importance of a methodical approach.
Google Colab proved to be an ideal choice—not only for writing and testing Python code easily, but also for security features like "Secrets," which are essential for professional credential management.
Overall, the project has been a valuable opportunity to demonstrate the technical and analytical skills needed for the Data Source Analyst role.