# Part 5 : 应用推送

<br>

### 任务目标

1. 根据上一步微调模型推理后的标签关联性评分，计算推送优先级；
2. 根据推送优先级，推送至Cubox第三方应用；
3. 根据推送优先级，推送至协作多维表格；
4. 根据推送优先级，推送至协作机器人。
   
<br>
<br>

In [117]:
import yaml
from IPython.display import display
import mysql.connector
import json
import pandas as pd
import os
import requests
from datetime import datetime
from dotenv import load_dotenv

load_dotenv()

True

<br>

## 一、推送优先级计算
由于采集文章众多，需要精选出部分文章优先被推送，在这里我们根据微调后的大模型推理的标签关联度进行推荐优先级计算，通过调整不同的标签、信息源权重即可进行推送文章排序。

将每篇文章的每个标签的关联度（relevance）与对应标签的权重（tag_priority[tag]）相乘，并进行加权标准化。文章的最终分数会乘以对应的来源权重（source_priority[source]）。若文章来源不在预设的优先级列表中，则默认赋予权重  0.3 。最后，将计算出的分数限制在 0～10 之间，并保留两位小数。

- **根据标签权重，计算加权平均数：** 
$$
\mathrm{Normalized\ Score} = 10 \times \sum_{\mathrm{tag}}\left(\mathrm{relevance}_{\mathrm{tag}} \times \frac{\mathrm{priority}_{\mathrm{tag}}}{\mathrm{total\_priority}}\right)
$$

- **根据文章来源权重，对推荐系数调整：** 
$$
\text{Adjusted Score} = \text{Normalized Score} \times \text{source priority}
$$

<br>

### 1.1 数据准备
查询待推送的文章，这些文章在分类打好标签后，没有被计算优先级

In [118]:
with open(f'config_new.yaml', 'r') as file:
    config = yaml.safe_load(file)

mysql_message = config['mysql']
connection = mysql.connector.connect(
        host=mysql_message['host'],
        port=mysql_message['port'],
        database=mysql_message['database'],
        user=mysql_message['user'],
        password=mysql_message['password'],
        ssl_disabled=True
    )

cursor = connection.cursor(dictionary=True)

In [119]:
query = """
            SELECT title, link, description_ai_summary, folder, source, category_relevance, tags
            FROM process_article_table
            LIMIT %s
            """
cursor.execute(query, (9999,))
query_list = cursor.fetchall()
display(pd.DataFrame(query_list))

Unnamed: 0,title,link,description_ai_summary,folder,source,category_relevance,tags
0,Create your fashion assistant application usin...,https://aws.amazon.com/blogs/machine-learning/...,本文介绍了如何使用Amazon Bedrock代理和Titan模型创建一个时尚助手应用，该助...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 6, ""计算机"": 8, ""个人娱乐""...","人工智能,计算机,云计算"
1,"How Aviva built a scalable, secure, and reliab...",https://aws.amazon.com/blogs/machine-learning/...,本文介绍了Aviva如何利用Amazon SageMaker构建一个可扩展、安全且可靠的ML...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","人工智能,云计算,计算机"
2,Visier’s data science team boosts their model ...,https://aws.amazon.com/blogs/machine-learning/...,这篇文章介绍了Visier如何通过迁移到Amazon SageMaker，将模型输出提升十倍...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 1, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","云计算,人工智能,计算机"
3,Achieve operational excellence with well-archi...,https://aws.amazon.com/blogs/machine-learning/...,本文讨论了如何通过使用Amazon Bedrock，构建架构良好的生成式人工智能解决方案以实...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 9, ""计算机"": 8, ""个人娱乐""...","人工智能,架构师,计算机"
4,Elevate workforce productivity through seamles...,https://aws.amazon.com/blogs/machine-learning/...,本文探讨了Amazon Q Business如何通过个性化提升工作效率，并介绍了如何利用这一...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 6, ""架构师"": 5, ""计算机"": 7, ""个人娱乐""...","人工智能,计算机,商业案例"
...,...,...,...,...,...,...,...
4326,前极越中层人士发声：夏一平原本没有通过面试,https://36kr.com/p/3080984075892614?f=rss,前极越中层人士表示，夏一平原本没有通过面试，却成为了百度造车的掌舵者。这一事件引发了业界对极...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 1, ""计算机"": 3, ""个人娱乐""...","汽车行业,商业案例,经济观察"
4327,三星想用掌机振兴芯片业务，现实或许没那么美好,https://36kr.com/p/3078518733715080?f=rss,三星希望通过掌机市场振兴其Exynos芯片业务，但现实情况可能没有那么理想。过去Exynos...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 2, ""计算机"": 8, ""个人娱乐""...","计算机,商业案例,经济观察"
4328,健身人的冬训，在滑雪场,https://36kr.com/p/3077426587039360?f=rss,滑雪成为冬季健身爱好者的重要选择，尤其在移动互联网的推动下，吸引了大量年轻人参与。自2015...,36kr,36kr,"{""其他"": 6, ""云计算"": 1, ""架构师"": 1, ""计算机"": 2, ""个人娱乐""...",个人娱乐
4329,《极越车主自救指南》：不幸买了“烂尾车”，车主该如何自救？,https://36kr.com/p/3077321746185735?f=rss,《极越车主自救指南》介绍了极越车主在公司倒闭后的自救方法。文章描述了极越车主在短短几天内经历...,36kr,36kr,"{""其他"": 7, ""云计算"": 0, ""架构师"": 0, ""计算机"": 2, ""个人娱乐""...","汽车行业,商业案例,经济观察"


<br>

### 1.2 定义优先级权重因子

In [120]:
# 标签权重
tag_priority = {
    '计算机': 0.7,
    '云计算': 1,
    '人工智能': 1,
    '架构师': 1,
    '经济观察': 0.5,
    '商业案例': 0.5,
    '汽车行业': 0.6,
    '个人娱乐': 0.3,
    '其他': 0.1,
}
  
# 文章来源权重
source_priority = {
    'OpenAI_Blog': 1,
    'DeepMind-Google': 0.8,
    'Meta-Research': 1,
    'AWS-MachineLearning': 1,
    'Microsoft_Blog': 0.6,
    'InfoQ': 1,
    'deeplearning_ai': 1,
    '36kr': 0.4,
    'sspai': 0.3,
}

<br>

### 1.3 优先级计算

In [121]:
def compute_category_score(article):
    # 解析文章中的关联度数据
    category_relevance = json.loads(article['category_relevance'])
    score = 0.0
    total_score = 10 * sum(tag_priority.values())
    for tag, relevance in category_relevance.items():
        if tag in tag_priority:
            # 将标签的关联度乘以标签的权重
            score += relevance * tag_priority[tag] / total_score  # 将每个关联度标准化到0~10分

    if article['source'] in source_priority:
        score *= source_priority[article['source']]
    else:
        score *= 0.3
    return min(round(score, 2), 10.0)  # 限制最高分为10

In [122]:
for article in query_list:
    article['priority'] = compute_category_score(article)

display(pd.DataFrame(query_list))

Unnamed: 0,title,link,description_ai_summary,folder,source,category_relevance,tags,priority
0,Create your fashion assistant application usin...,https://aws.amazon.com/blogs/machine-learning/...,本文介绍了如何使用Amazon Bedrock代理和Titan模型创建一个时尚助手应用，该助...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 6, ""计算机"": 8, ""个人娱乐""...","人工智能,计算机,云计算",0.57
1,"How Aviva built a scalable, secure, and reliab...",https://aws.amazon.com/blogs/machine-learning/...,本文介绍了Aviva如何利用Amazon SageMaker构建一个可扩展、安全且可靠的ML...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","人工智能,云计算,计算机",0.65
2,Visier’s data science team boosts their model ...,https://aws.amazon.com/blogs/machine-learning/...,这篇文章介绍了Visier如何通过迁移到Amazon SageMaker，将模型输出提升十倍...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 1, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","云计算,人工智能,计算机",0.63
3,Achieve operational excellence with well-archi...,https://aws.amazon.com/blogs/machine-learning/...,本文讨论了如何通过使用Amazon Bedrock，构建架构良好的生成式人工智能解决方案以实...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 9, ""计算机"": 8, ""个人娱乐""...","人工智能,架构师,计算机",0.63
4,Elevate workforce productivity through seamles...,https://aws.amazon.com/blogs/machine-learning/...,本文探讨了Amazon Q Business如何通过个性化提升工作效率，并介绍了如何利用这一...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 6, ""架构师"": 5, ""计算机"": 7, ""个人娱乐""...","人工智能,计算机,商业案例",0.50
...,...,...,...,...,...,...,...,...
4326,前极越中层人士发声：夏一平原本没有通过面试,https://36kr.com/p/3080984075892614?f=rss,前极越中层人士表示，夏一平原本没有通过面试，却成为了百度造车的掌舵者。这一事件引发了业界对极...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 1, ""计算机"": 3, ""个人娱乐""...","汽车行业,商业案例,经济观察",0.15
4327,三星想用掌机振兴芯片业务，现实或许没那么美好,https://36kr.com/p/3078518733715080?f=rss,三星希望通过掌机市场振兴其Exynos芯片业务，但现实情况可能没有那么理想。过去Exynos...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 2, ""计算机"": 8, ""个人娱乐""...","计算机,商业案例,经济观察",0.16
4328,健身人的冬训，在滑雪场,https://36kr.com/p/3077426587039360?f=rss,滑雪成为冬季健身爱好者的重要选择，尤其在移动互联网的推动下，吸引了大量年轻人参与。自2015...,36kr,36kr,"{""其他"": 6, ""云计算"": 1, ""架构师"": 1, ""计算机"": 2, ""个人娱乐""...",个人娱乐,0.08
4329,《极越车主自救指南》：不幸买了“烂尾车”，车主该如何自救？,https://36kr.com/p/3077321746185735?f=rss,《极越车主自救指南》介绍了极越车主在公司倒闭后的自救方法。文章描述了极越车主在短短几天内经历...,36kr,36kr,"{""其他"": 7, ""云计算"": 0, ""架构师"": 0, ""计算机"": 2, ""个人娱乐""...","汽车行业,商业案例,经济观察",0.12


In [123]:
# 按 priority 降序排序，取前 5 个
top_5_queries = sorted(query_list, key=lambda x: x['priority'], reverse=True)[:5]

# 输出结果
display(pd.DataFrame(top_5_queries))

Unnamed: 0,title,link,description_ai_summary,folder,source,category_relevance,tags,priority
0,对话阿里云 CIO 雁杨：AI 时代，企业如何做好智能化系统建设？,https://www.infoq.cn/video/5p16V2Y8K6Z9k0Matib...,阿里云CIO雁杨讨论了在AI时代企业要如何进行智能化系统建设。本文重点介绍了企业在AI和云计...,InfoQ,InfoQ,"{""其他"": 5, ""云计算"": 10, ""架构师"": 8, ""计算机"": 8, ""个人娱乐...","云计算,人工智能,计算机",0.75
1,对话阿里云 CIO 蒋林泉：AI 时代，企业如何做好智能化系统建设？,https://www.infoq.cn/article/xdeyGH8L3WgP57lPh...,阿里云 CIO 蒋林泉探讨了在 AI 时代企业智能化系统建设的策略和方法。,InfoQ,InfoQ,"{""其他"": 1, ""云计算"": 10, ""架构师"": 7, ""计算机"": 8, ""个人娱乐...","云计算,人工智能,计算机",0.72
2,How Clearwater Analytics is revolutionizing in...,https://aws.amazon.com/blogs/machine-learning/...,Clearwater Analytics正在利用生成式人工智能和Amazon SageMak...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 10, ""架构师"": 8, ""计算机"": 8, ""个人娱乐...","云计算,人工智能,计算机",0.72
3,讯飞星火4.0 Turbo、超拟人数字人等11个首发，科大讯飞如何深入大模型国产化“无人区”,https://www.infoq.cn/article/ByiPIyq4GAClgdvy9...,科大讯飞发布了讯飞星火4.0 Turbo和超拟人数字人等11项新技术，展示其在大模型国产化领...,InfoQ,InfoQ,"{""其他"": 4, ""云计算"": 8, ""架构师"": 7, ""计算机"": 10, ""个人娱乐...","计算机,人工智能,云计算",0.71
4,极客游学启航：极客邦创始人霍太稳邀您共赴Vegas，探索全球技术盛会re:Invent！,https://www.infoq.cn/article/6iuWyVq4Px5y94q8B...,极客邦创始人霍太稳邀请您一同前往拉斯维加斯，探索全球技术盛会re:Invent。,InfoQ,InfoQ,"{""其他"": 2, ""云计算"": 10, ""架构师"": 8, ""计算机"": 9, ""个人娱乐...","云计算,计算机,架构师",0.7


---

<br>

## 二、推送至Cubox平台

![Cubox产品演示图](pictures/Cubox产品演示图.png)

In [98]:
cubox_url = os.getenv("cubox_url")

In [99]:
def send_post_request(json_data, retries=3, timeout=5):
    for attempt in range(retries):
        try:
            response = requests.post(cubox_url, json=json_data, timeout=timeout)
            if json.loads(response.text)['code'] == 200:
                print(f"【状态】 已发送至Cubox, {response.text}, 内容为{json_data}")
                return

            elif json.loads(response.text)['code'] == -3030:
                print(f"【状态】 已超过上传限制, 内容为{json_data}")
                return
            else:
                print(f"【状态】 上传失败，状态码: {json.loads(response.text)['code']}, 内容为{json_data}")
        except requests.exceptions.RequestException as e:
            print(f"{json_data['title']}：请求异常 {e}。")

In [116]:
for article in top_5_queries:
    tags_list = article.get('tags', '无标题').split(',')
    content_dict = {
        'type': 'url',
        'content': article.get('link', '未知链接'),
        'title': article.get('title', '无标题'),
        'description': f"【{article.get('priority', '无标题')}】{article.get('description_ai_summary', '无描述')}",
        'tags': tags_list,
        'folder': article.get('folder', '默认')
    }
    send_post_request(content_dict)

【状态】 已超过上传限制, 内容为{'type': 'url', 'content': 'https://www.infoq.cn/video/5p16V2Y8K6Z9k0MatibO?utm_source=rss&utm_medium=article', 'title': '对话阿里云 CIO 雁杨：AI 时代，企业如何做好智能化系统建设？', 'description': '【0.75】阿里云CIO雁杨讨论了在AI时代企业要如何进行智能化系统建设。本文重点介绍了企业在AI和云计算领域的应用及其重要性。', 'tags': ['云计算', '人工智能', '计算机'], 'folder': 'InfoQ'}
【状态】 已超过上传限制, 内容为{'type': 'url', 'content': 'https://www.infoq.cn/article/xdeyGH8L3WgP57lPhFaF?utm_source=rss&utm_medium=article', 'title': '对话阿里云 CIO 蒋林泉：AI 时代，企业如何做好智能化系统建设？', 'description': '【0.72】阿里云 CIO 蒋林泉探讨了在 AI 时代企业智能化系统建设的策略和方法。', 'tags': ['云计算', '人工智能', '计算机'], 'folder': 'InfoQ'}
【状态】 已超过上传限制, 内容为{'type': 'url', 'content': 'https://aws.amazon.com/blogs/machine-learning/how-clearwater-analytics-is-revolutionizing-investment-management-with-generative-ai-and-amazon-sagemaker-jumpstart/', 'title': 'How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart', 'description': '【0.72】Clearwater Analytics正在利用生成式人工智能和Amazon S

---

<br>

## 三、插入协作多维表格

### 3.1 多维表格配置
![协作表格对接图](pictures/协作表格配置图.png)

<br>

### 3.2 查询尚未推送的数据

In [124]:
# 更新主键 ID 最大的记录的 is_pushed_table 字段为 0（演示用，后台定时任务实时更新）
cursor.execute("""
    UPDATE process_article_table
    SET is_pushed_table = %s
    WHERE id = (SELECT max_id FROM (SELECT MAX(id) AS max_id FROM process_article_table) AS temp)
""", (0,))
connection.commit()
print("Successfully updated the latest inserted record!")

Successfully updated the latest inserted record!


In [125]:
# 查询为没有被推送的数据
cursor.execute("SELECT * FROM process_article_table WHERE is_pushed_table = FALSE")
unpushed_list = cursor.fetchall()
display(pd.DataFrame(query_list))

Unnamed: 0,title,link,description_ai_summary,folder,source,category_relevance,tags,priority
0,Create your fashion assistant application usin...,https://aws.amazon.com/blogs/machine-learning/...,本文介绍了如何使用Amazon Bedrock代理和Titan模型创建一个时尚助手应用，该助...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 6, ""计算机"": 8, ""个人娱乐""...","人工智能,计算机,云计算",0.57
1,"How Aviva built a scalable, secure, and reliab...",https://aws.amazon.com/blogs/machine-learning/...,本文介绍了Aviva如何利用Amazon SageMaker构建一个可扩展、安全且可靠的ML...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","人工智能,云计算,计算机",0.65
2,Visier’s data science team boosts their model ...,https://aws.amazon.com/blogs/machine-learning/...,这篇文章介绍了Visier如何通过迁移到Amazon SageMaker，将模型输出提升十倍...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 1, ""云计算"": 9, ""架构师"": 7, ""计算机"": 8, ""个人娱乐""...","云计算,人工智能,计算机",0.63
3,Achieve operational excellence with well-archi...,https://aws.amazon.com/blogs/machine-learning/...,本文讨论了如何通过使用Amazon Bedrock，构建架构良好的生成式人工智能解决方案以实...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 7, ""架构师"": 9, ""计算机"": 8, ""个人娱乐""...","人工智能,架构师,计算机",0.63
4,Elevate workforce productivity through seamles...,https://aws.amazon.com/blogs/machine-learning/...,本文探讨了Amazon Q Business如何通过个性化提升工作效率，并介绍了如何利用这一...,AWS-MachineLearning,AWS-MachineLearning,"{""其他"": 3, ""云计算"": 6, ""架构师"": 5, ""计算机"": 7, ""个人娱乐""...","人工智能,计算机,商业案例",0.50
...,...,...,...,...,...,...,...,...
4326,前极越中层人士发声：夏一平原本没有通过面试,https://36kr.com/p/3080984075892614?f=rss,前极越中层人士表示，夏一平原本没有通过面试，却成为了百度造车的掌舵者。这一事件引发了业界对极...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 1, ""计算机"": 3, ""个人娱乐""...","汽车行业,商业案例,经济观察",0.15
4327,三星想用掌机振兴芯片业务，现实或许没那么美好,https://36kr.com/p/3078518733715080?f=rss,三星希望通过掌机市场振兴其Exynos芯片业务，但现实情况可能没有那么理想。过去Exynos...,36kr,36kr,"{""其他"": 5, ""云计算"": 2, ""架构师"": 2, ""计算机"": 8, ""个人娱乐""...","计算机,商业案例,经济观察",0.16
4328,健身人的冬训，在滑雪场,https://36kr.com/p/3077426587039360?f=rss,滑雪成为冬季健身爱好者的重要选择，尤其在移动互联网的推动下，吸引了大量年轻人参与。自2015...,36kr,36kr,"{""其他"": 6, ""云计算"": 1, ""架构师"": 1, ""计算机"": 2, ""个人娱乐""...",个人娱乐,0.08
4329,《极越车主自救指南》：不幸买了“烂尾车”，车主该如何自救？,https://36kr.com/p/3077321746185735?f=rss,《极越车主自救指南》介绍了极越车主在公司倒闭后的自救方法。文章描述了极越车主在短短几天内经历...,36kr,36kr,"{""其他"": 7, ""云计算"": 0, ""架构师"": 0, ""计算机"": 2, ""个人娱乐""...","汽车行业,商业案例,经济观察",0.12


<br>

### 3.3 构造Webhook请求

In [126]:
def send_to_webhook(article, webhook_url):
    def json_serial(obj):
        """如果对象是 datetime 类型，就转换为 ISO 8601 格式的字符串"""
        if isinstance(obj, datetime):
            return obj.isoformat()  # 将 datetime 转换为 ISO 格式字符串
        raise TypeError(f"Type {type(obj)} not serializable")
    
    headers = {
        'Content-Type': 'application/json',
        'Origin': 'www.kdocs.cn'  # 或 'www.wps.cn'
    }
    # 发送POST请求，传递文章数据
    # 使用自定义的json_serial函数来处理datetime对象
    try:
        response = requests.post(
            webhook_url,
            data=json.dumps(article, default=json_serial),
            headers=headers
        )
        # 打印响应信息
        print("Response Status Code:", response.status_code)
        print("Response Content:", response.text)
        
    except requests.RequestException as e:
        print("Error sending webhook:", str(e))
        return False


In [129]:
xiezuo_webhook_url = os.getenv("xiezuo_webhook_url")
for article in unpushed_list:
    send_to_webhook(article, xiezuo_webhook_url)


Response Status Code: 200
Response Content: {"result":"ok"}


---
<br>

## 四、协作机器人推送

![协作机器人配置说明](pictures/协作机器人配置说明.png)

In [130]:
from push_article.notifier import Notification


company_id = os.getenv("company_id")
app_id = os.getenv("app_id")
app_secret = os.getenv("app_secret")
uer_id = os.getenv("uer_id")

In [131]:
notifier = Notification(company_id, app_id, app_secret)
notifier.send_card(uer_id, top_5_queries, template='push_article/msg_card.json')



协作消息已成功推送！
