# Setup

In [16]:
# define llm
llm_config = {"model": "gpt-4o"}

# Group Chat 
- user_proxy or admin 
- data sources reader 
- planner 
- web scraper 
- engineer
- executor 
- writer 
- content checking and validation to the given resources 

In [17]:
import autogen
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [18]:
# function to process the url and api 
def read_data_from_url(url):
    try:
        # get request 
        response = requests.get(url)
        response.raise_for_status() 
        soup = BeautifulSoup(response.content, 'html.parser')
        # extract and return the content 
        text = soup.get_text()
        return text
    except Exception as e:
        return str(e)

def read_data_from_api(api_url):
    try:
        # get API 
        response = requests.get(api_url)
        response.raise_for_status()
        # return json content 
        data = response.json()
        return data
    except Exception as e:
        return str(e)

In [19]:
# retrieve
def read_data(agent, resource_type, resource_location):
    # Call the function to read data from a URL
    if resource_type == 'url':
        return read_data_from_url(resource_location)
    # Call the function to read data from an API
    elif resource_type == 'api':
        return read_data_from_api(resource_location)
    else:
        return "Unsupported resource type."

In [20]:
# agent to read data
data_sources_reader = autogen.ConversableAgent(
    name="data_sources_reader",
    system_message="Identify and retrieve data from various data sources such as APIs, databases, and CSV files.",
    description="Data Sources Reader. Gathers data from multiple sources.",
    llm_config=llm_config,
)

In [21]:
# Read data 

# data type
resource_type = 'url'  
# data location 
resource_location = 'https://www.nextflow.io/docs/latest/index.html#'  
data = read_data(data_sources_reader, resource_type, resource_location)

print(data)

HTTPSConnectionPool(host='www.nextflow.io', port=443): Max retries exceeded with url: /docs/latest/index.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x00000237DFA73F90>: Failed to resolve 'www.nextflow.io' ([Errno 11001] getaddrinfo failed)"))


In [26]:
# Agent 

# user_proxy agent 
user_proxy = autogen.ConversableAgent(
    name="user_proxy",
    system_message="Give the task, and send instructions to writer to refine the blog post, also when request input from human, make it simple and undestandable",
    code_execution_config=False,
    llm_config=llm_config,
    human_input_mode="ALWAYS"
)

# agent planner to write ToC
planner = autogen.ConversableAgent(
    name="planner",
    system_message=""" 
        Create a table of content based on the {resource_location} given and create only 6 section valuable information
        - Create quote related to the context
        - Assign sections 1 and 2 of the ToC to the writer_1_2.
        - Assign sections 3 and 4 of the ToC to the writer_3_4.
        - Assign sections 5 and 6 of the ToC to the writer_5_6.
        Instruct the web scraper and engineer on their tasks for each ToC section.
    """,
    description="Planner. Plans the task and assigns sections to the appropriate writer agents.",
    llm_config=llm_config,
)

# web scraper agent 
web_scraper = autogen.ConversableAgent(
    name="web_scraper",
    system_message="Scrape data from websites based on the planner's instructions for each ToC section. Extract, clean, and format the data, and make sure the data scraper from the {resource_location} given",
    description="Web Scraper. Scrapes data from the web and processes it.",
    llm_config=llm_config,
)

# engineer agent to write code 
engineer = autogen.ConversableAgent(
    name="engineer",
    system_message="Write and modify code based on the planner's instructions for each ToC section. Implement web scraping and data processing.",
    description="Engineer. Writes and modifies code specific to each ToC section.",
    llm_config=llm_config,
)

# agent to create image 
painter = autogen.ConversableAgent(
    name="painter",
    system_message="create images based on the images needed for each section",
    description="Painter. Create image related to the content given",
    llm_config=llm_config
)

# agent executor to execute code
executor = autogen.ConversableAgent(
    name="executor",
    system_message="Execute the code provided by the engineer and report the results.",
    description="Executor. Runs the code and provides outputs.",
    human_input_mode="NEVER",
    code_execution_config={
        "last_n_messages": 3,
        "work_dir": "coding_report_generation",
        "use_docker": False,
    },
)

# agent writer 
# writer agent for sections 1-2
writer_1_2 = autogen.ConversableAgent(
    name="writer_1_2",
    system_message="Write detailed content for sections 1 and 2 of the ToC in markdown format using the data provided by the executor. Ensure that the content is comprehensive, well-structured, and includes two paragraphs for each section. Make sure you only work on the section given by planner and based on the {resource_location} given",
    description="Writer 1-2. Focuses on creating detailed content for sections 1 and 2.",
    llm_config=llm_config,
)

# writer agent for sections 3-4
writer_3_4 = autogen.ConversableAgent(
    name="writer_3_4",
    system_message="Write detailed content for sections 3 and 4 of the ToC in markdown format using the data provided by the executor. Ensure that the content is comprehensive, well-structured, and includes two paragraphs for each section. Make sure you only work on the section given by planner and based on the {resource_location} given",
    description="Writer 3-4. Focuses on creating detailed content for sections 3 and 4.",
    llm_config=llm_config,
)

writer_5_6 = autogen.ConversableAgent(
    name="writer_5_6",
    system_message="Write detailed content for sections 5 and 6 of the ToC in markdown format using the data provided by the executor. Ensure that the content is comprehensive, well-structured, and includes two paragraphs for each section. Make sure you only work on the section given by planner and based on the {resource_location} given",
    description="Writer 5-6. Focuses on creating detailed content for sections 5 and 6.",
    llm_config=llm_config,
)

# agent content check 
content_checker = autogen.ConversableAgent(
    name="content_checker",
    system_message="Validate the content against the original data sources. Ensure accuracy and relevance.",
    description="Content Checker. Validates the final content.",
    llm_config=llm_config,
)

final_touch = autogen.ConversableAgent(
    name="final_touch",
    system_message="Give the final output after the all agent do their task and make the generated output neater and more like a blog and remove unused character",
    description="final_touch. Agent that give the final output after all agent do their task",
    llm_config=llm_config
)

In [31]:
# group chat 
groupchat = autogen.GroupChat(
    agents=[user_proxy, data_sources_reader, planner, web_scraper, engineer, executor, writer_1_2, writer_3_4, writer_5_6, content_checker, final_touch],
    messages=[],
    max_round=10
)

# Manager agent that manages the group chat
manager = autogen.GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

In [32]:
task_1 = f"""
        Create one chapter for Nextflow Basic Installation based {resource_location}, and following this step:
        - Create the Table of content 
        - Scrapping the resource given based on the table of content 
        - Write the content 
        - Generate image based on the content given, this image used for better understanding like workflow or anything that needs image
        - Make sure that the content create based on the resource given
        - Write each section in detail oriented and long explanation if there is needs to create table, go with that
        - Generate output for each section at least having 3 paragraph
        - Please give the final output to look like blog
"""

In [33]:
# Start 
groupchat_result = user_proxy.initiate_chat(
    manager,
    message=task_1,
    additional_data={"data": data}
)

[33muser_proxy[0m (to chat_manager):


        Create one chapter for Nextflow Basic Installation based https://www.nextflow.io/docs/latest/index.html#, and following this step:
        - Create the Table of content 
        - Scrapping the resource given based on the table of content 
        - Write the content 
        - Generate image based on the content given, this image used for better understanding like workflow or anything that needs image
        - Make sure that the content create based on the resource given
        - Write each section in detail oriented and long explanation if there is needs to create table, go with that
        - Generate output for each section at least having 3 paragraph
        - Please give the final output to look like blog


--------------------------------------------------------------------------------
[32m
Next speaker: planner
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mplanner[0m (to chat_manager):

Sure! Here's how we will break down 

## Nextflow Basic Installation

---

### Table of Contents

1. **Introduction to Nextflow**
2. **Prerequisites and System Requirements**
3. **Installation Steps**
4. **Post-Installation and Verification**

---

### 1. Introduction to Nextflow

Nextflow is an advanced dataflow programming language and computational pipeline tool used predominantly in bioinformatics. It was designed to address the challenges associated with running complex computational workflows, making it a favorite among researchers and developers alike.

Nextflow enables the writing of scalable and reproducible scientific workflows. It provides a fluent and human-readable method of handling data streams, which eases the management and automation of analytical pipelines. In this section, we cover the essentials of Nextflow, including its core features, benefits, and typical use cases.

**Key Features and Benefits**
- **Scalability**: One of Nextflow's standout features is its ability to run on both single-machine setups and large distributed environments, including clusters and cloud services. It scales seamlessly from small to large computational workloads without needing to alter the core pipeline.
- **Portability**: Nextflow ensures portable workflows. With support for Docker, Singularity, and other containers, users can encapsulate their computations in self-contained environments, ensuring consistent behavior across different systems and infrastructures.
- **Flexibility**: The tool integrates effortlessly with a range of existing data infrastructures and tools like AWS, Google Cloud, and HPC schedulers. It removes the complexity of transitioning workflows across differing systems, saving time and reducing potential errors.
- **Reproducibility**: Reproducibility is central to trust in scientific results. Nextflow's built-in support for version control and modular pipeline design ensures that analyses can be repeated or re-executed efficiently across different environments.

**Typical Use Cases**
- **Genomics Sequencing**: Handling the massive datasets involved in genomics studies, particularly for alignment, variant calling, and genome assembly.
- **Biomedical Research**: Integrating various biomedical datasets and performing comprehensive analyses to draw meaningful conclusions.
- **Molecular Dynamics**: Running simulations and analysis workflows that require significant computational power and sophisticated pipeline orchestration.

Nextflow’s capability to streamline complex workflows and maintain high levels of automation, reproducibility, and flexibility makes it indispensable in modern computational science.

---

### 2. Prerequisites and System Requirements

Before you begin with the installation of Nextflow, it’s essential to ensure that your system meets the necessary requirements. This includes both hardware specifications and vital software dependencies to guarantee that Nextflow runs smoothly.

**Minimum Hardware Requirements**
- **CPU**: Dual-core processor for basic operations. For heavy computational tasks, a more powerful multi-core CPU is recommended.
- **RAM**: At least 4 GB of RAM is required. For memory-intensive workflows, it's advisable to have 16 GB or more.
- **Disk Space**: Ensure you have at least 10 GB of free disk space. Additional space might be necessary based on the complexity and data intensity of your workflows.

**Software Dependencies**
- **Java**: Nextflow requires Java version 8 or higher. You can check if Java is installed and its version by running `java -version` in your terminal.
- **Docker**: For workflows leveraging containerization, Docker is highly recommended. This ensures consistent computational environments.
- **Git**: As Nextflow supports version control for pipeline scripts, Git must be installed and configured on your system. This is essential for managing the versions of your workflows and collaborating with others.

**Preliminary Setup Steps**
1. **Update Package Lists**: It's always a good practice to update your package lists to ensure you get the latest versions of the software packages:
    ```sh
    sudo apt-get update
    ```
2. **Install Java**: If Java is not already installed, you can install it using:
    ```sh
    sudo apt-get install openjdk-11-jre
    ```
3. **Verify Java Installation**: To verify that Java is installed correctly, use:
    ```sh
    java -version
    ```
4. **Install Docker**: Follow Docker's official documentation to install Docker on your system.

By ensuring these prerequisites are met, you can be confident that your system is ready for the installation of Nextflow, leading to fewer issues and smoother operations.

---

### 3. Installation Steps

The installation of Nextflow is straightforward but requires careful adherence to the steps to ensure a successful setup. Below, we describe the step-by-step process for installing Nextflow on your machine.

**Download Instructions**
To install Nextflow, you need to download a script from the official repository and then execute it. The process is as follows:
1. Open your terminal.
2. Run the following command to download and execute the Nextflow installation script:
    ```sh
    curl -s https://get.nextflow.io | bash
    ```
    This command fetches the script and runs it, downloading the Nextflow executable file (typically named `nextflow`) into your current working directory.
3. Make the Nextflow executable file runnable by setting the appropriate permissions:
    ```sh
    chmod +x nextflow
    ```

**Configuration**
After downloading and setting up the executable file, the next step is to ensure it’s accessible from anywhere in your terminal. This is done by moving it to a directory included in your system’s PATH.
1. Move the `nextflow` file to a directory like `/usr/local/bin` for universal accessibility:
    ```sh
    sudo mv nextflow /usr/local/bin/
    ```
2. Verify the installation by checking the version:
    ```sh
    nextflow -version
    ```

By following these steps, you have successfully installed Nextflow on your system. The diagram below illustrates the installation workflow for better understanding.

![Installation Workflow](https://i.imgur.com/CQwvSUH.png)

---

### 4. Post-Installation and Verification

Once Nextflow is installed, the next crucial step is to verify that it’s working correctly on your system. This involves running a simple script to ensure that Nextflow can execute properly and all dependencies are correctly configured.

**Running First Script**
To verify the installation, you can run a basic "Hello, World!" script. Here’s how to do it:
1. Create a new file named `hello.nf` and add the following content:
    ```sh
    #!/usr/bin/env nextflow

    process sayHello {
        output:
        stdout file('hello.txt')

        script:
        """
        echo "Hello, World!" > hello.txt
        """
    }
    ```
2. Save the file and return to the terminal.
3. Execute the script using the following command:
    ```sh
    nextflow run hello.nf
    ```
4. If the script runs successfully, it will create a file named `hello.txt` containing the text "Hello, World!".

**Common Troubleshooting Tips**
- **Java Issues**: Ensure that Java is correctly installed and configured. You can verify this by running `java -version`. If Java is not found, ensure it’s installed and added to your PATH.
- **Docker Integration**: If using Docker, ensure it’s correctly installed and running. Test Docker with a simple command like `docker --version`.
- **Path Issues**: Verify that the `nextflow` executable is in your PATH. You can do this by running `which nextflow` to see if the correct path is listed.

The verification process ensures that Nextflow is fully operational, positioning you to start developing and running complex workflows. The diagram below demonstrates the verification workflow.

![Verification Workflow](https://i.imgur.com/axIq0lK.png)

By following the detailed installation and verification steps outlined in this guide, you are now ready to harness Nextflow’s full power for managing your computational workflows efficiently and effectively.


-- SECOND EXAMPLE USING 6 TOC --

## Nextflow Basic Installation

### Table of Contents
1. [Overview](#overview)
2. [System Requirements and Prerequisites](#system-requirements-and-prerequisites)
3. [Downloading and Installing Nextflow](#downloading-and-installing-nextflow)
4. [Setting up the Environment](#setting-up-the-environment)
5. [Verifying the Installation](#verifying-the-installation)
6. [Conclusion](#conclusion)

---

## Overview

Nextflow is an open-source software application developed by the Computational Biology group at the Centre for Genomic Regulation (CRG). This tool is designed to streamline the creation and management of complex computational workflows, which is particularly critical in bioinformatics, where the analysis of large-scale datasets is commonplace. By utilizing the Groovy programming language—a superset of Java—Nextflow integrates seamlessly with pre-existing Java libraries and frameworks, offering both flexibility and power.

A standout feature of Nextflow is its support for container technologies such as Docker and Singularity. This feature simplifies deploying workflows across various computing environments, whether on-premise or in the cloud. The adaptability and scalability of Nextflow make it a preferred tool for researchers and developers engaged in high-performance computing tasks, particularly in fields like genomics and other data-intensive disciplines.

---

## System Requirements and Prerequisites

Before embarking on the installation of Nextflow, it is essential to ensure that your system meets specific prerequisites to guarantee a smooth installation and operational process. Below are the necessary system specifications and prerequisites:

### System Requirements

- **Operating System**: Nextflow is compatible with a diverse range of operating systems including Linux, macOS, and Windows.
- **Java Runtime Environment (JRE)**: The smooth execution of Nextflow scripts mandates JRE version 8 or higher, making this a critical component in the setup.

### Pre-installation Checks

#### 1. Java Installation
Ensure that Java 8 or a later version is installed on your system. A reliable source for downloading Java is the [official Java website](https://www.java.com/en/download/).

#### 2. Verify Java Installation
To confirm Java has been installed properly, use the following command in your terminal:

```sh
java -version
```

This should produce output similar to:

```sh
java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)
```

### Ensuring Adequate Disk Space and Permissions
It is important to ensure that your system has enough disk space to accommodate Nextflow and the data it will process. Additionally, verify that you have the required permissions to execute installation commands. If using Docker, ensure Docker is installed and running correctly on your machine.

---

## Downloading and Installing Nextflow

The process of installing Nextflow is straightforward and can typically be completed within a few steps. Here's a detailed guide to downloading and setting up Nextflow on your machine:

### Downloading Nextflow

#### 1. Open Terminal
Begin by opening your system’s terminal or command line interface.

#### 2. Download the Nextflow Binary
Execute the following command to download the Nextflow executable:

```sh
curl -s https://get.nextflow.io | bash
```

### Moving Nextflow to Executable Path

After downloading, it is necessary to move the Nextflow binary to a directory included in your system’s PATH to facilitate easier access. Use the following command:

```sh
sudo mv nextflow /usr/local/bin/
```

### Verifying the Installation

To confirm that Nextflow is installed successfully, check its version with the following command:

```sh
nextflow -version
```

You should see an output such as:

```sh
nextflow version 20.10.0.5430
```

---

## Setting up the Environment

After successfully installing Nextflow, the next step involves configuring your environment to run workflows effectively. This setup includes creating configuration files and integrating container technologies like Docker.

### Configuration File

Create a configuration file named `nextflow.config` in the root directory of your project. This file will define the resources and parameters for your workflows:

```groovy
process {
    executor = 'local'
    cpus = 2
    memory = '4 GB'
}
```

### Docker Integration

To leverage Docker, ensure Docker is installed and running on your machine. Add the following lines to your `nextflow.config` file for Docker integration:

```groovy
docker {
    enabled = true
}
```

### Test Setup

Create a basic Nextflow script named `hello.nf` to confirm that your environment is configured correctly:

```groovy
process sayHello {
    output:
    stdout

    script:
    """
    echo 'Hello, Nextflow!'
    """
}

workflow {
    sayHello()
}
```

Run this script with:

```sh
nextflow run hello.nf
```

---

## Verifying the Installation

Ensuring the proper installation and configuration of Nextflow is crucial for a reliable setup for your future workflows. Follow these steps to verify that everything is functioning correctly:

### Run the Sample Script

Utilize the `hello.nf` script from the previous section to check whether Nextflow executes it without any issues:

```sh
nextflow run hello.nf
```

### Docker Test Script

If Docker integration is enabled, create a script that utilizes Docker. Name this file `dockerTest.nf` and include the following content:

```groovy
process helloDocker {
    container 'ubuntu:latest'

    output:
    stdout

    script:
    """
    echo 'Hello from Docker container!'
    """
}

workflow {
    helloDocker()
}
```

Run this script with:

```sh
nextflow run dockerTest.nf
```

### Monitor Logs

Ensure the logs show no errors, indicating a successful execution:

```sh
N E X T F L O W  ~  version 20.10.0.5430
Launching `dockerTest.nf` [practical_ganguly] - revision: abc123
executor >  local (1)
[42/993ab1] process > helloDocker [100%] 1 of 1 ✔
Hello from Docker container!
```

---

## Conclusion

Installing Nextflow is a straightforward endeavor if you adhere to the documentation and ensure your system meets the necessary prerequisites. By following these steps, you gain a powerful tool capable of creating and managing complex computational workflows. Nextflow is versatile, adaptable to both on-premise and cloud infrastructures, making it particularly beneficial for bioinformatics and genomics research.

A proper environment setup and thorough verification of the installation lay a solid foundation for leveraging Nextflow's full potential. Whether analyzing large-scale genomic data or tackling other computational tasks, Nextflow's flexible and scalable nature will significantly enhance workflow efficiency and reproducibility.

**Happy data processing with Nextflow!**

---

## Visualization

### Nextflow Installation Workflow

To enhance understanding, the following flowchart illustrates the installation and configuration process:

![Nextflow Installation Workflow](https://example.com/nextflow-installation-workflow.png)

---

This chapter provides a comprehensive look into the installation and basic setup of Nextflow, a powerful tool for workflow automation and management. By following this guide, you are well on your way to leveraging Nextflow's full potential in your computational research and projects.

--------------------------------------------------------------------------------