#### ***深层关联挖掘***

* 根据社交网络数据集 提取特征训练模型
* 最终实现如下功能（以推测性别为例）： 
     * 给定某人的好友数量、社交活跃度等特征来推测其性别
     * 上述特征都可以额外定义或通过关系网络进行统计计算出来
     * 最终输出 **模型在测试集上的准确率** 与 **对给定用户性别的预测**
* 目前只是随机生成数据（没有落地到项目的实际应用中，仅作为概念模型进行框架设计） 故模型准确率不稳定

In [20]:
import random
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score


# 生成随机示例数据
# 包括用于训练和评估的数据集以及待决策的用户
users = [f"User_{i}" for i in range(1, 21)]
edges = [(random.choice(users), random.choice(users)) for _ in range(30)]
user_features = {user: {'gender': random.choice(['男', '女']),
                        'activity': random.uniform(0, 1),  # 添加社交活跃度特征
                        'friends': random.randint(0, 100)}  # 添加好友数量特征
                 for user in users}
labels = [random.choice([0, 1]) for _ in range(20)]
new_user_features = {'gender': random.choice(['男', '女']),
                     'activity': random.uniform(0, 1),
                     'friends': random.randint(0, 100)}

# 数据收集与预处理
graph = {}
for edge in edges:
    user1, user2 = edge[0], edge[1]
    if user1 not in graph:
        graph[user1] = []
    if user2 not in graph:
        graph[user2] = []
    graph[user1].append(user2)
    graph[user2].append(user1)

# 特征工程
# 利用 gender_mapping 映射将 gender 特征转换为数字编码 01
# 将其他特征转换为特征向量后存入 feature_matrix
gender_mapping = {'男': 0, '女': 1}
for user, features in user_features.items():
    if 'gender' in features:
        gender = features['gender']
        if gender in gender_mapping:
            user_features[user]['gender'] = gender_mapping[gender]

feature_matrix = []
for user, features in user_features.items():
    feature_row = []
    for feature in features.values():
        feature_row.append(feature)
    feature_matrix.append(feature_row)

# 模型训练与评估
# 创建KNN分类器，并指定 n_neighbors、weights 等超参数
# 并划分训练集和测试集，训练完成后在测试集上对模型进行初步评估
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, labels, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5, weights='distance', metric='euclidean') 
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# 决策和预测
# 将新用户的特征同样转换成特征向量后，利用训练好的 KNN模型 进行预测
new_user_gender = new_user_features['gender']
if new_user_gender in gender_mapping:
    new_user_features['gender'] = gender_mapping[new_user_gender]
new_user_feature_vector = []
for feature in new_user_features.values():
    new_user_feature_vector.append(feature)
prediction = knn.predict([new_user_feature_vector])
print("Prediction:", prediction)


Accuracy: 0.5
Prediction: [1]


####  ***用户推荐算法***
* 用户推荐算法主要实现以下功能：
     * 基于用户相似度矩阵进行推荐，即对于目标用户，可以根据其与其他用户的相似度，找到与其相似度较高的一些用户
     * 并将这些相似用户推荐给该用户（类似于社交网络中的扩圈）
     * 此处仅以推荐相似用户给用户为例，且同样没有功能落地，仅是框架设计，后续拓展可以基于相似用户进行分类推送内容

In [22]:
import numpy as np

# 构建用户相似度矩阵
# 此处 user_similarity_matrix 的构建仅把用户的共同好友数量作为衡量因素
# 只适用于较为简单的社交网络
def build_user_similarity_matrix(graph):
    users = list(graph.keys())
    n_users = len(users)
    similarity_matrix = np.zeros((n_users, n_users))
    
    for i in range(n_users):
        for j in range(i+1, n_users):
            user1 = users[i]
            user2 = users[j]
            common_friends = set(graph[user1]) & set(graph[user2])
            similarity = len(common_friends) / (len(set(graph[user1])) + len(set(graph[user2])))
            similarity_matrix[i, j] = similarity
            similarity_matrix[j, i] = similarity
    
    return similarity_matrix

# 基于用户相似度矩阵进行推荐
def user_based_recommendation(similarity_matrix, user_features, target_user, n_recommendations=5):
    target_user_index = users.index(target_user)
    target_user_similarities = similarity_matrix[target_user_index]
    sorted_indices = np.argsort(target_user_similarities)[::-1]
    
    recommendations = []
    for i in sorted_indices:
        if target_user_similarities[i] > 0:
            similar_user = users[i]
            if similar_user not in graph[target_user]:
                recommendations.append(similar_user)
        
        if len(recommendations) == n_recommendations:
            break
    
    return recommendations

# 生成示例数据
users = ['User_1', 'User_2', 'User_3', 'User_4', 'User_5']
edges = [('User_1', 'User_2'), ('User_1', 'User_3'), ('User_2', 'User_3'), ('User_2', 'User_4'), ('User_3', 'User_5')]
user_features = {'User_1': {'gender': '男'}, 'User_2': {'gender': '女'}, 'User_3': {'gender': '男'}, 'User_4': {'gender': '女'}, 'User_5': {'gender': '男'}}
target_user = 'User_1'

# 数据收集与预处理
graph = {}
for edge in edges:
    user1, user2 = edge[0], edge[1]
    if user1 not in graph:
        graph[user1] = []
    if user2 not in graph:
        graph[user2] = []
    graph[user1].append(user2)
    graph[user2].append(user1)

# 构建用户相似度矩阵
similarity_matrix = build_user_similarity_matrix(graph)

# 进行用户推荐
recommendations = user_based_recommendation(similarity_matrix, user_features, target_user)

print("Recommendations for", target_user, ":", recommendations)


Recommendations for User_1 : ['User_5', 'User_4']
