Repo Analyzer

Number of lines of code [this excludes comments, whitespaces, blank lines].
List of external libraries/packages used.
The Nesting factor for the repository: the Nesting factor is the average depth of a nested for loop throughout the code.
Code duplication: What percentage of the code is duplicated per file. If the same 4 consecutive lines of code (disregarding blank lines, comments, etc. other non code items) appear in multiple places in a file, all the occurrences except the first occurence are considered to be duplicates.
Average number of parameters per function definition in the repository.
Average Number of variables defined per line of code in the repository.

2. Architecture

To calculate the code statistics python's ast module is used to generate an Abstract Syntax Tree, which is then visited to calculate 5 statistics. To calculate the code duplicate we use python dict hashmap implementation to compare every 4 lines of code in the repo.
To perform the processing we use Kubernetes, Docker and RabbitMQ. We have the producer add a message to queue for every repo that needs processing, then using kubernetes we run several workers [5-10] to take messages from the queue, clone the repo, calculate the statistics then add the calculated result to another results queue in rabbitmq.
The results_parser is responsible for consuming messages from the results queue and append it to the json file.
The final results are found in results/results.json.

2.1 Classes

Class	Description
Worker	The worker class connects to rabbitmq urls queue and fetches a url and uses RepoAnalyzer to process the url then add the result to the results queue.
Producer	The producer class is responsible for loading the urls file and adding the urls to the queue for processing.
RepoAnalyzer	The repo analyzer class is responsible for cloning a repo, scanning for python files and doing the analysis using Analyzer class.
ResultsParser	The results parser class is responsible for fetching the results from the results queue and saving it to the disk.

3. Installation

3.1 Dependencies

RabbitMQ
Python 3
Docker [recommended]

3.2 Steps

Clone this repository

git clone https://github.com/melzareix/repo-analyzer.git
cd repo-analyzer

Create .env file with the following variables (you can also add these variables via docker)

RABBIT_USERNAME=user
RABBIT_PASSWORD=user
RABBIT_HOST=127.0.0.1
RABBIT_PORT=5672
QUEUE_NAME=URLS_QUEUE_NAME
RESULTS_QUEUE=RESULTS_QUEUE_NAME

If you use don't use docker, use pip to install dependencies

pip3 install -r requirements.txt

If you are using docker then build the container

docker build -t data-challenge .

4. Usage

1. Non-Docker

Run producer to add urls to queue.

python3 src/producer.py

Run worker to process urls.

python3 src/worker.py

After the worker finishes, run results parser to parse results.

python3 src/results_parser.py

2. Docker

Run producer to add urls to queue.

docker run -it --rm data-challenge /bin/sh -c "python3 src/producer.py"

Run worker to process urls.

docker run -it --rm data-challenge

After the worker finishes, run results parser to parse results.

docker run -it --rm --name data-container data-challenge /bin/sh -c "python3 src/results_parser.py"

Copy results file from container to your machine.

docker cp data-container:/results/results_100000.json results.json

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
kubernetes		kubernetes
results		results
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
url_list.txt		url_list.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo Analyzer

Table Of Contents

1. Introduction

2. Architecture

2.1 Classes

3. Installation

3.1 Dependencies

3.2 Steps

4. Usage

1. Non-Docker

2. Docker

About

Releases

Packages

Languages

melzareix/repo-analyzer

Folders and files

Latest commit

History

Repository files navigation

Repo Analyzer

Table Of Contents

1. Introduction

2. Architecture

2.1 Classes

3. Installation

3.1 Dependencies

3.2 Steps

4. Usage

1. Non-Docker

2. Docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages