# 🚀 Automated README.md Generator for GitHub Repositories 📚

This flow is designed to streamline the process of creating comprehensive and informative README files for your projects.
By leveraging Generative AI models with the Automata Framework we can automatically summarize the contents of each file within a repository and compile these summaries into a well-structured README.md file.


## Objective

The primary goal of this project is to develop an automated pipeline that:

1. **Summarizes Repository Files**: Executes in parallel to analyze and summarize all files within the repository, while intelligently excluding files and directories listed in the `.gitignore` file.
2. **Generates README Content**: Utilizes the aggregated summaries to craft the contents of a README.md file that highlights key aspects and provides a clear overview of the repository.
3. **Refine and Save the README File**: Refine the Outputs the final README.md file, ready to be added to the repository for enhanced documentation and user guidance.


### Step 1: Installing packages

#### Lets start by installing `lyzr-automata` framework

In [None]:
pip install git+https://github.com/LyzrCore/lyzr-automata

#### Helper Functions to collect file contents from a specified directory

In [22]:
import os
import fnmatch

def read_gitignore(repo_path):
    """
    Reads and parses the .gitignore file in the given repository path.

    :param repo_path: String representing the path to the repository.
    :return: A list of ignore patterns found in the .gitignore file.
    """
    ignore_file = os.path.join(repo_path, '.gitignore')
    ignore_patterns = []
    try:
        with open(ignore_file, 'r', encoding='utf-8') as file:
            for line in file:
                line = line.strip()
                if line and not line.startswith('#'):
                    ignore_patterns.append(line)
    except FileNotFoundError:
        print(".gitignore file not found. Using All the Files.")
    return ignore_patterns


def should_ignore_file(path, ignore_patterns, repo_path):
    """
    Checks whether the given path should be ignored based on the .gitignore patterns.

    :param path: String representing the path to be checked.
    :param ignore_patterns: A list of patterns from the .gitignore file.
    :param repo_path: String representing the path to the repository.
    :return: Boolean indicating whether the path should be ignored.
    """
    # Convert the path to a relative path from the repo_path for correct pattern matching
    relative_path = os.path.relpath(path, repo_path)

    for pattern in ignore_patterns:
        # Check if the pattern matches any part of the relative path
        if fnmatch.fnmatch(relative_path, pattern) or fnmatch.fnmatch(os.path.basename(path), pattern):
            return True
        # Special handling for directory patterns to match the entire directory tree
        if pattern.endswith('/') and fnmatch.fnmatch(relative_path, pattern.rstrip('/') + '/*'):
            return True
    return False


def scrape_files_contents(repo_path):
    """
    Scrapes the contents of all files in the given repository path that are not ignored by .gitignore.

    :param repo_path: String representing the path to the repository.
    :return: A dictionary with file paths as keys and their contents as values.
    """
    ignore_patterns = read_gitignore(repo_path)
    file_contents = {}

    for root, dirs, files in os.walk(repo_path):
        # Ensure that `should_ignore_file` is called with `repo_path` for correct evaluation
        dirs[:] = [d for d in dirs if not should_ignore_file(os.path.join(root, d), ignore_patterns, repo_path)]
        files[:] = [f for f in files if not should_ignore_file(os.path.join(root, f), ignore_patterns, repo_path)]

        for file in files:
            file_path = os.path.join(root, file)
            try:
                with open(file_path, 'r', encoding='utf-8') as f:
                    file_contents[file_path] = f.read()
            except (IOError, UnicodeDecodeError) as exc:
                print(f"Error reading file {file_path}: {exc}")

    return file_contents


def scrape_repo_files(repo_path):
    """
    Generates a list of formatted strings containing the path and content of non-ignored files.

    :param repo_path: String representing the path to the repository.
    :return: A list of formatted strings for each of the non-ignored files.
    """
    contents = scrape_files_contents(repo_path)
    repo_data = [f"File: {file_path}\n\n{content}\n\n" for file_path, content in contents.items()]
    return repo_data


def truncate_large_files(string_list, max_length=10000):
    """
    Modify strings in the provided list that are longer than the specified threshold.

    This function truncates strings that are longer than max_length characters by
    keeping an equal number of characters from the start and end of the string,
    with a newline in between.

    :param string_list: List of strings to be modified.
    :param max_length: Maximum allowed length for the strings.
    :return: A list of strings, modified if they exceed the length threshold.
    """
    modified_list = []
    start_keep = (max_length + 1) // 2
    end_keep = (max_length - 1) // 2

    for s in string_list:
        if len(s) > max_length:
            modified_string = s[:start_keep] + "\n...\n" + s[-end_keep:]
            modified_list.append(modified_string)
        else:
            modified_list.append(s)
    return modified_list

### **Step 2** : Create models and initalize your models with api key and parameters

In [21]:
import asyncio
from google.colab import userdata
from lyzr_automata.ai_models.openai import OpenAIModel
from lyzr_automata.agents.agent_base import Agent
from lyzr_automata.tasks.task_base import Task
from lyzr_automata.tasks.task_literals import InputType, OutputType

# We will first create open ai model for our language tasks
# and set params according to chat completion.

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY') # Get OPENAI_API_KEY from environment variable

open_ai_model_text = OpenAIModel(
    api_key= userdata.get('OPENAI_API_KEY'),
    parameters={
        "model": "gpt-4-turbo-preview",
        "temperature": 1,
    },
)

###: **Step 3** : Create Agents for Summary and README Generation


In [11]:
code_summarize_agent = Agent(
    prompt_persona="""You are a Senior Software Engineer. Your task is to analyse content of individual file in the repository and write a CONCISE summary which then will be used to create the README.md file for the entire repository

    Given a file from a software repository, extract the following meaningful information that will be essential for building a comprehensive README.md:

    File Name: Name of the file.
    Purpose: Summarize the primary purpose or functionality of the file, with main functions.
    Usage/Dependencies/Highlights: A short note on the Usage, any external libraries, dependencies or any special remarks, TODOs, or important comments.

    Extract this information in a structured format so that it can be efficiently used to build a README.md file later.
    """,
    role="Code Explainer",
)

readme_writer_agent = Agent(
    prompt_persona="You are a professional README.md file writer, you will get the summery of each and every file from the repo, your task is to draft a beafutiful README file for the repo",
    role="Repository Documentation Specialist",
)

### **Step 4** : Get Summary of all the files in the repo

In [None]:
# Using the Lyzr Repo for Demo
!git clone https://github.com/lyzrCore/lyzr/

In [23]:
# using the core code of lyzr instead of full repo
repo_path = '/content/lyzr/lyzr'
repo_data = scrape_repo_files(repo_path=repo_path)
truncated_repo_data = truncate_large_files(string_list=repo_data, max_length=12000)

summery_list = []


.gitignore file not found. Using All the Files.


In [24]:
# Define an asynchronous function to execute the summarization task
async def summarize_file(file_data):
    summarize_task = Task(
        name="Draft Concise Summary of the file",
        agent=code_summarize_agent,
        output_type=OutputType.TEXT,
        input_type=InputType.TEXT,
        model=open_ai_model_text,
        instructions="Analyze the individual file from the repo and create a summary with brevity",
        default_input=file_data,
        enhance_prompt=False,
        log_output=True,
    )
    # Running the blocking function `execute()` in a thread pool
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, summarize_task.execute)

# Run all the summarization tasks concurrently
async def run_summarization_tasks():
    tasks = [summarize_file(file_data) for file_data in truncated_repo_data]
    results = await asyncio.gather(*tasks)
    return results

# Execute the run_summarization_tasks coroutine and get the summaries
summery_list = await run_summarization_tasks()

# Print or process the summaries as needed
complete_repo_summery = ''.join(summery_list)

START TASK Draft Concise Summary of the file :: start time : 1709048024.5063426START TASK Draft Concise Summary of the file :: start time : 1709048024.5074658

START TASK Draft Concise Summary of the file :: start time : 1709048024.5267217
START TASK Draft Concise Summary of the file :: start time : 1709048024.5541155
START TASK Draft Concise Summary of the file :: start time : 1709048024.5715876
START TASK Draft Concise Summary of the file :: start time : 1709048024.587012
output : ### File Summary

- **File Name**: /content/lyzr/lyzr/voicebot/__init__.py

- **Purpose**: This file is responsible for initializing the `voicebot` package within the larger `lyzr` Python project. Its main functionality includes making the `VoiceBot` class from the `voicebot.voicebot` module available for import when the `voicebot` package is imported.

- **Usage/Dependencies/Highlights**:
    - **Usage**: This file is typically automatically executed when the `voicebot` package is imported into a Python sc

### **Step 5** : Create the README.md file

---



In [26]:
readme_writing_task = Task(
    name="Professional Readme Writing",
    agent=readme_writer_agent,
    output_type=OutputType.TEXT,
    input_type=InputType.TEXT,
    model=open_ai_model_text,
    instructions=""" Collect and merge individual file summaries from the repository, emphasizing key features, organizing logically for flow, ensuring consistency, and refining the final README.md for completeness and clarity. Write a Beautiful and Detailed README.md file which provides the complete overview of the repo provides a quick start to the user""",
    default_input=complete_repo_summery,
    log_output=True,
    enhance_prompt=False,
).execute()

print(readme_writing_task)
file_name = 'README.md'
with open(file_name, 'w') as readme_file:
    readme_file.write(readme_writing_task)

START TASK Professional Readme Writing :: start time : 1709048219.0767934
output : # Lyzr: Elevate Your Data Analysis and Bot Interaction

Welcome to Lyzr, a comprehensive Python library designed to revolutionize how data scientists and developers interact with data and create intelligent bots. The Lyzr library spans across a broad spectrum of functionalities, offering seamless chat and voice bot interfaces, advanced data analysis tools, and a robust formula generation utility. This README.md provides a detailed overview of the library's capabilities, emphasizing its key features, organizational logic, and quick start guide for users.

## Overview

Lyzr aims to simplify the process of data handling, analysis, and bot interaction in Python. By amalgamating various tools and services into one cohesive library, Lyzr streamlines workflows and enhances productivity. Here’s what makes Lyzr stand out:

- **Bot Interfaces**: With classes like `ChatBot` and `VoiceBot`, create responsive and int

#### Optional

Refine, Modify and Save the README File

In [27]:
readme_improvement_feedback = """The Readme looks too bland and somewhat informal. It also lacks detail
Improve the structure, add a few emojis while keeping it professional and improve the details
"""

refinement_input = f"""
Previously Generated README.md file:
```md
{readme_writing_task}
```

Complete Summary for each and every file in the Repository:
```
{complete_repo_summery}
```
"""

readme_improvement_agent = Agent(
    prompt_persona="You are a staff software engineer, your task is to improve upon the previously README.md file and make it professional and make sure it adheers to industry standards. You should priortize user feedback for the modification if any",
    role="Repository Documentation Specialist",
)

readme_refinement_task = Task(
    name="Professional Readme Writing",
    agent=readme_improvement_agent,
    output_type=OutputType.TEXT,
    input_type=InputType.TEXT,
    model=open_ai_model_text,
    instructions= f""" Improve the REAMDE.md file based on the user feeback.
    Below is the user feedback:
    ```
    {readme_improvement_feedback}
    ```
    Collect and merge individual file summaries from the repository, emphasizing key features, organizing logically for flow, ensuring consistency, and refining the final README.md for completeness and clarity.
    Write a Beautiful and Detailed README.md file which provides the complete overview of the repo provides a quick start to the user""",
    default_input=refinement_input,
    log_output=True,
    enhance_prompt=False,
).execute()

file_name = 'Refined-README.md'
with open(file_name, 'w') as readme_file:
    readme_file.write(readme_writing_task)

START TASK Professional Readme Writing :: start time : 1709049505.2961154
output : Given the lack of specific user feedback or content to generate a detailed README.md, let's construct an improved README.md for the **Lyzr** repository based on general best practices, incorporating elements like structure, user feedback consideration, and detailed instructions.

---

# Lyzr: Data Analysis & Bot Interaction Python Library

Welcome to the **Lyzr** Python library, your comprehensive toolkit designed to revolutionize the way data scientists and developers analyze data and interact with bots. Whether you're crafting intelligent bots or delving deep into data analysis, Lyzr equips you with the necessary tools to enhance productivity and innovation.

## Table of Contents 📚

- [Overview](#overview)
- [Key Features](#key-features)
  - [Bot Interaction](#bot-interaction)
  - [Data Analysis and Connection](#data-analysis-and-connection)
  - [Document Reading and Formula Generation](#document-readi