## Chat with OpenAI Assistant using function call in AutoGen: OSS Insights for Advanced GitHub Data Analysis

This Jupyter Notebook demonstrates how to leverage OSS Insight (Open Source Software Insight) for advanced GitHub data analysis by defining `Function calls` in AutoGen for the OpenAI Assistant. 

The notebook is structured into four main sections:

1. Function Schema and Implementation
2. Defining an OpenAI Assistant Agent in AutoGen
3. Fetching GitHub Insight Data using Function Call

### Requirements

AutoGen requires `Python>=3.8`. To run this notebook example, please install:
```bash
pip install pyautogen
```

In [None]:
%%capture --no-stderr
# %pip install "pyautogen~=0.2.0b5

### Function Schema and Implementation

This section provides the function schema definition and their implementation details. These functions are tailored to fetch and process data from GitHub, utilizing OSS Insight's capabilities.

In [1]:
import logging
import os
import requests

logger = logging.getLogger(__name__)
logger.setLevel(logging.WARNING)

ossinsight_api_schema = {
  "name": "ossinsight_data_api",
  "parameters": {
    "type": "object",
    "properties": {
      "question": {
        "type": "string",
        "description": (
            "Enter your GitHub data question in the form of a clear and specific question to ensure the returned data is accurate and valuable. "
            "For optimal results, specify the desired format for the data table in your request."
        ),
      }
    },
    "required": [
      "question"
    ]
  },
  "description": "This is an API endpoint allowing users (analysts) to input question about GitHub in text format to retrieve the realted and structured data."
}

def get_ossinsight(question):
    """
    Retrieve the top 10 developers with the most followers on GitHub.
    """
    url = "https://api.ossinsight.io/explorer/answer"
    headers = {"Content-Type": "application/json"}
    data = {
        "question": question,
        "ignoreCache": True
    }

    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
        answer = response.json()
    else:
        return f"Request to {url} failed with status code: {response.status_code}"

    report_components = []
    report_components.append(f"Question: {answer['question']['title']}")
    if answer['query']['sql']  != "":
        report_components.append(f"querySQL: {answer['query']['sql']}")

    if answer.get('result', None) is None or len(answer['result']['rows']) == 0:
        result = "Result: N/A"
    else:
        result = "Result:\n  " + "\n  ".join([str(row) for row in answer['result']['rows']])
    report_components.append(result)

    if  answer.get('error', None) is not None:
        report_components.append(f"Error: {answer['error']}")
    return "\n\n".join(report_components)

### Defining an OpenAI Assistant Agent in AutoGen

Here, we explore how to define an OpenAI Assistant Agent within the AutoGen. This includes setting up the agent to make use of the previously defined function calls for data retrieval and analysis.

In [2]:
from autogen import config_list_from_json
from autogen.agentchat.contrib.gpt_assistant_agent import GPTAssistantAgent
from autogen import UserProxyAgent

assistant_id = os.environ.get("ASSISTANT_ID", None)
config_list = config_list_from_json("../OAI_CONFIG_LIST")
llm_config = {
    "config_list": config_list,
    "assistant_id": assistant_id,
     "tools": [
        {
            "type": "function",
            "function": ossinsight_api_schema,
        }
    ]
}

oss_analyst = GPTAssistantAgent(
    name="OSS Analyst",                            
    instructions=(
        "Hello, Open Source Project Analyst. You'll conduct comprehensive evaluations of open source projects or organizations on the GitHub platform, "
        "analyzing project trajectories, contributor engagements, open source trends, and other vital parameters. "
        "Please carefully read the context of the conversation to identify the current analysis question or problem that needs addressing."
    ),
    llm_config=llm_config,
)
oss_analyst.register_function(
    function_map={
        "ossinsight_data_api": get_ossinsight,
    }
)

GPT Assistant only supports one OpenAI client. Using the first client in the list.
assistant_id was None, creating a new assistant


### Fetching GitHub Insight Data using Function Call

This part of the notebook demonstrates the practical application of the defined functions and the OpenAI Assistant Agent in fetching and interpreting GitHub Insight data.

In [3]:
user_proxy = UserProxyAgent(name="user_proxy",
    code_execution_config={
        "work_dir": "coding"
    },
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    human_input_mode="NEVER",
    max_consecutive_auto_reply=1)

user_proxy.initiate_chat(oss_analyst, message="Top 10 developers with the most followers")

[33muser_proxy[0m (to OSS Analyst):

Top 10 developers with the most followers

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION ossinsight_data_api...[0m
[33mOSS Analyst[0m (to user_proxy):

The top 10 developers with the most followers on GitHub are:

1. **Linus Torvalds** (torvalds) - 166,730 followers
2. **Evan You** (yyx990803) - 86,239 followers
3. **Dan Abramov** (gaearon) - 77,611 followers
4. **Ruan YiFeng** (ruanyf) - 72,668 followers
5. **Jake Wharton** (JakeWharton) - 65,415 followers
6. **Zhihui Peng** (peng-zhihui) - 60,972 followers
7. **Brad Traversy** (bradtraversy) - 58,172 followers
8. **Gustavo Guanabara** (gustavoguanabara) - 52,143 followers
9. **Sindre Sorhus** (sindresorhus) - 51,542 followers
10. **TJ Holowaychuk** (tj) - 49,621 followers


--------------------------------------------------------------------------------
[33muser_proxy[0m (to OSS Analyst):



-------------------------------

In [1]:
get_weather_schema = {
    "name": "get_weather_api",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": (
                    "输入想要查询天气的所在地，例如，北京"
                ),
            }
        },
        "required": [
            "location"
        ]
    },
  "description": "这是一个能够查询指定地点天气状况的API接口"
}

def get_weather(location):
    print("get_weather called:", location)
    return "\n\n" + "天气晴，25摄氏度"


In [7]:
get_air_quality_schema = {
    "name": "get_air_quality_api",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": (
                    "输入想要查询空气质量的所在地，例如，北京"
                ),
            }
        },
        "required": [
            "location"
        ]
    },
  "description": "这是一个能够查询指定地点空气质量的API接口"
}

def get_air_quality(location):
    print("get_air_quality called:", location)
    return "\n\n" + "空气质量：良，有轻微污染，PM2.5指数是70"

In [2]:
get_visitor_flowrate_schema = {
    "name": "get_visitor_flowrate_api",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": (
                    "输入想要查询游客人流量的目的地，例如，颐和园"
                ),
            }
        },
        "required": [
            "location"
        ]
    },
  "description": "这是一个能够查询指定景点的游客人流量的接口"
}

def get_visitor_flowrate(location):
    print("get_visitor_flowrate called:", location)
    return "\n\n" + "人流量密度较大"

In [3]:
get_cur_location_schema = {
    "name": "get_cur_location_api",
    "parameters": {
        "type": "object",
        "properties": {
        },
        "required": [
        ]
    },
  "description": "这是一个能够获取当前定位位置的接口"
}

def get_cur_location():
    print("get_cur_location called:")
    return "\n\n" + "望京金辉大厦"

In [4]:
get_travel_time_schema = {
    "name": "get_travel_time_api",
    "parameters": {
        "type": "object",
        "properties": {
            "source": {
                "type": "string",
                "description": (
                    "输入行程的起点位置"
                ),
            },
            "destination": {
                "type": "string",
                "description": (
                    "输入行程的终点位置"
                ),
            }
        },
        "required": [
            "source",
            "destination"
        ]
    },
  "description": "这是一个能够根据输入行程起点和终点位置，获取行程时间的接口"
}

def get_travel_time(source, destination):
    print("get_travel_time called:", source, destination)
    return "\n\n" + "1小时"

In [5]:
get_train_list_schema = {
    "name": "get_train_list_api",
    "parameters": {
        "type": "object",
        "properties": {
            "source": {
                "type": "string",
                "description": (
                    "输入火车的起始站"
                ),
            },
            "destination": {
                "type": "string",
                "description": (
                    "输入火车的终点站"
                ),
            },
            "date": {
                "type": "string",
                "description": (
                    "输入要查询火车班次的日期"
                ),
            }
        },
        "required": [
            "source",
            "destination",
            "date"
        ]
    },
  "description": "这是一个能够根据起始站、终点站和日期查询火车班次列表的接口"
}

def get_train_list(source, destination, date):
    print("get_train_list called:", source, destination, date)
    return "\n\n" + "共有2班火车，上午11点和下午2点各有一班"

In [8]:
from autogen import config_list_from_json
from autogen.agentchat.contrib.gpt_assistant_agent import GPTAssistantAgent
from autogen import UserProxyAgent
import logging
import os
import requests

assistant_id = os.environ.get("ASSISTANT_ID", None)
config_list = config_list_from_json("../OAI_CONFIG_LIST")
llm_config = {
    "config_list": config_list,
    "assistant_id": assistant_id,
     "tools": [
        {
            "type": "function",
            "function": get_weather_schema,
        },
        {
            "type": "function",
            "function": get_air_quality_schema,
        },
        {
            "type": "function",
            "function": get_visitor_flowrate_schema,
        },
        {
            "type": "function",
            "function": get_cur_location_schema,
        },
        {
            "type": "function",
            "function": get_travel_time_schema,
        },
        {
            "type": "function",
            "function": get_train_list_schema,
        }
    ]
}

oss_analyst = GPTAssistantAgent(
    name="Weather_reporter",                            
    instructions=(
        "你是一个出行助手，你可以通过查询天气、查询空气质量、查询景点人流量、查询行程时间、查询定位等工具帮助人们定制出行计划。当工具所需信息不完全时，可以通过对话的方式来询问。回复中包含\"TERMINATE\"字符串，当可视化任务处理结束时。"
    ),
    llm_config=llm_config,
)
oss_analyst.register_function(
    function_map={
        "get_weather_api": get_weather,
        "get_air_quality_api":get_air_quality,
        "get_visitor_flowrate_api":get_visitor_flowrate,
        "get_cur_location_api":get_cur_location,
        "get_travel_time_api":get_travel_time,
        "get_train_list_api":get_train_list
    }
)

GPT Assistant only supports one OpenAI client. Using the first client in the list.
assistant_id was None, creating a new assistant


In [12]:
user_proxy = UserProxyAgent(name="user_proxy",
    code_execution_config={
        "work_dir": "coding"
    },
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    human_input_mode="ALWAYS",
    max_consecutive_auto_reply=1)

user_proxy.initiate_chat(oss_analyst, message="今天想去雁栖湖玩，有什么建议嘛")

[33muser_proxy[0m (to Weather_reporter):

今天想去雁栖湖玩，有什么建议嘛

--------------------------------------------------------------------------------
Assistants API决定要调用以下function:  ['get_cur_location_api']
[35m
>>>>>>>> EXECUTING FUNCTION get_cur_location_api...[0m
get_cur_location called:
Assistants API决定要调用以下function:  ['get_weather_api', 'get_air_quality_api']
[35m
>>>>>>>> EXECUTING FUNCTION get_weather_api...[0m
get_weather called: 雁栖湖
[35m
>>>>>>>> EXECUTING FUNCTION get_air_quality_api...[0m
get_air_quality called: 雁栖湖
Assistants API决定要调用以下function:  ['get_travel_time_api']
[35m
>>>>>>>> EXECUTING FUNCTION get_travel_time_api...[0m
get_travel_time called: 望京金荣大厦 雁栖湖
Assistants API决定要调用以下function:  ['get_visitor_flowrate_api']
[35m
>>>>>>>> EXECUTING FUNCTION get_visitor_flowrate_api...[0m
get_visitor_flowrate called: 雁栖湖
[33mWeather_reporter[0m (to user_proxy):

前往雁栖湖的主要建议如下：

1. **天气状况：** 雁栖湖今天天气晴朗，气温大约为25℃，适合户外活动。
2. **空气质量：** 雁栖湖当前的空气质量为‘良’，有轻微污染，PM2.5指数为70。如果对空气污染较为敏感，建