# Choice of Graph Representation​

Ref 
- https://www.youtube.com/watch?v=P-m1Qv6-8cI&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=3
- https://www.youtube.com/watch?v=3IS7UhNMQ3U&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=5

**Keywords**

Objects: nodes, vertices -> N

Interations: links, edges -> E

System: network, graph -> G(N,E)

## Node-level Tasks in Graph Neural Networks (GNNs)

Node-level tasks involve making predictions or extracting information specific to individual nodes within a graph. These tasks are fundamental in various applications where the focus is on understanding the properties or roles of nodes rather than the entire graph or edges.

#### Key Aspects of Node-level Tasks

1. **Node Classification**:
   - **Goal**: Predict the category or label of a node based on its features and the structure of the graph.
   - **Applications**: 
     - Social networks: Identifying the type of user (e.g., bot or human).
     - Citation networks: Classifying academic papers into different research fields.

2. **Node Regression**:
   - **Goal**: Predict a continuous value for each node.
   - **Applications**:
     - Financial networks: Estimating the risk score of individual accounts.
     - Social networks: Predicting the influence score of users.

3. **Node Clustering**:
   - **Goal**: Group nodes into clusters based on similarity in features and graph structure.
   - **Applications**:
     - Community detection in social networks.
     - Identifying functional modules in biological networks.

4. **Node Embedding**:
   - **Goal**: Learn a vector representation for each node that captures the node's features and its structural context in the graph.
   - **Applications**:
     - Recommender systems: Generating embeddings for users and items to compute similarity.
     - Link prediction: Using node embeddings to predict potential connections between nodes.

#### Examples of Node-level Tasks in Practice

1. **Social Network Analysis**:
   - Detecting user communities or predicting user interests based on social interactions.

2. **Biological Networks**:
   - Identifying gene functions or predicting protein-protein interactions by analyzing biological graphs.

3. **Knowledge Graphs**:
   - Classifying entities (e.g., people, organizations) in a knowledge graph.

#### Why Node-level Tasks are Important

- **Granular Insights**: Provide detailed information at the node level, crucial for applications requiring fine-grained analysis.
- **Scalability**: Allow for scalable solutions by focusing on individual node properties rather than entire graph structures.
- **Versatility**: Applicable to various domains, including social networks, biology, finance, and more.

### Summary

Node-level tasks in GNNs are essential for extracting and predicting information specific to nodes within a graph. They include node classification, node regression, node clustering, and node embedding, each with significant applications across different domains. These tasks enable granular insights and versatile applications, making them a crucial aspect of graph-based machine learning.

## Difference between GDV and Clustering Coefficient

The image provided focuses on node features in the context of graphlets, highlighting key metrics like the Graphlet Degree Vector (GDV), Degree, and Clustering Coefficient. Let's break down each concept and understand their differences.

#### Graphlet Degree Vector (GDV) and Clustering Coefficient

### Graphlet Degree Vector (GDV)
- **Definition**: GDV is a vector that captures graphlet-based features for nodes. Graphlets are small, connected subgraphs of a larger network.
- **Purpose**: It measures the frequency and types of graphlets a node is part of, giving insight into the local topology around the node.
- **Calculation**: For a given node, count the number of occurrences of various graphlets (e.g., triangles, squares) it participates in. This count forms a vector.
  - Example: \( GDV(v) = [g_1(v), g_2(v), ..., g_n(v)] \) where \( g_i(v) \) is the count of graphlet \( i \) that node \( v \) is part of.

### Degree
- **Definition**: Degree counts the number of edges that a node touches.
- **Purpose**: It measures how many direct connections a node has.
  - Example: If node A has 5 edges connected to it, its degree is 5.

### Clustering Coefficient
- **Definition**: Clustering Coefficient counts the number of triangles that a node touches.
- **Purpose**: It measures how close a node's neighbors are to forming a complete graph (i.e., how interconnected a node's neighbors are).
- **Calculation**:
  - For a node \( v \) with \( k \) neighbors, the clustering coefficient \( C(v) \) is given by \( C(v) = \frac{2E(v)}{k(k-1)} \), where \( E(v) \) is the number of edges between the neighbors of \( v \).
  - Example: If node B's neighbors form 3 triangles, its clustering coefficient reflects this density of interconnectedness.

### Key Differences

1. **Focus**:
   - **Clustering Coefficient**: Measures local density of connections among a node's neighbors.
   - **GDV**: Measures the presence and types of subgraph structures (graphlets) a node is part of, capturing more complex local topological information.

2. **Information Captured**:
   - **Clustering Coefficient**: Focuses solely on triangles, providing a single value indicating local clustering.
   - **GDV**: Captures a spectrum of subgraph patterns, providing a richer, more detailed representation of local graph structure through a vector of counts.

3. **Complexity**:
   - **Clustering Coefficient**: Relatively simple to compute, involving only local neighbor connections.
   - **GDV**: More complex to compute, requiring enumeration of various graphlet types around the node.

### Summary
- **Degree**: Counts the number of direct edges a node has.
- **Clustering Coefficient**: Counts how many triangles a node's neighbors form, indicating local density of connections.
- **GDV**: Counts the number of various graphlets a node is part of, providing a detailed vector representation of local structural roles.

## Node-Level Features: Summary

The image provides an overview of various methods to obtain node features, categorizing them into importance-based features and structure-based features. Here’s a detailed explanation and summary of the content.

#### Categories of Node Features

1. **Importance-Based Features**:
   - **Node Degree**: Measures the number of edges connected to a node. It indicates how many direct connections a node has.
   - **Different Node Centrality Measures**: Various metrics to determine the importance of a node within the network. Examples include:
     - **Betweenness Centrality**: Indicates how often a node appears on the shortest paths between other nodes.
     - **Closeness Centrality**: Measures how close a node is to all other nodes in the network.
     - **Eigenvector Centrality**: Evaluates the influence of a node based on the influence of its neighbors.

2. **Structure-Based Features**:
   - **Node Degree**: (Also listed here) Represents the number of direct connections a node has.
   - **Clustering Coefficient**: Measures the degree to which nodes in a graph tend to cluster together. Specifically, it counts the number of triangles that include the node, indicating local density.
   - **Graphlet Count Vector**: Similar to GDV, this captures the frequency and types of graphlets (small subgraphs) that a node is part of. It provides a more detailed view of the node's local topology.

#### Summary Explanation

- **Importance-Based Features**:
  - These features are focused on the significance or influence of a node within the entire network.
  - Node Degree is a basic measure of connectivity.
  - Centrality measures provide a deeper insight into the role of a node, considering paths and influence within the network.

- **Structure-Based Features**:
  - These features capture the local structural properties of the node.
  - Node Degree again measures connectivity.
  - Clustering Coefficient gives an idea of the local clustering or cohesiveness around the node.
  - Graphlet Count Vector (or GDV) offers a detailed account of the node's participation in various small subgraph patterns, providing a nuanced view of its structural role.