<a href="https://colab.research.google.com/github/mirjunaid26/datascientist/blob/main/ConvertingTabularDatasetToGraphDataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Converting Tabular Dataset To Graph Dataset

I hope this notebook helps you to convert your CSV file into a graph dataset 🚀


`Step 0`

Bring some creativity and don't lose hope- It is very natural that it takes some time to rearrange the data in a graph format. Also, this notebook is only to help you to get started (you will have to transfer it to your specific use-case).

`Step 1`

To get started, identify the following things in your dataset (I have some real-world examples below in this notebook):

- Nodes (Items, People, Locations, Cars, ...)
- Edges (Connections, Interactions, Similarity, ...)
- Node Features (Attributes)
- Labels (Node-level, edge-level, graph-level)

and optionally:
- Edge weights (Strength of the connection, number of interactions, ...)
- Edge features (Additional (multi-dim) properties describing the edge)



`Step 2`

Do you have different node and edge types? (This means the nodes/edges have different attributes such as Cars vs. People)

- No, all my edges/nodes have the same type  --> **Proceed with 1.1**
- Yes, there are different relations and node types --> **Proceed with 1.2**


## 1.1 Homogeneous

`Step 3 / Example 1`

To make it as realistic as possible, I selected a random dataset I found on the internet that contains homogeneous nodes. This dataset is the [FIFA 21 Rating dataset](https://raw.githubusercontent.com/batuhan-demirci/fifa21_dataset), a dataset with soccer players.
Here we extract a small subset of the scraped data (there is much more available!) to build a graph dataset out of it. Have a look at the pandas Dataframe below.

In [1]:
import pandas as pd

# Download data (quietly)
!wget -q https://raw.githubusercontent.com/batuhan-demirci/fifa21_dataset/master/data/tbl_player.csv
!wget -q https://raw.githubusercontent.com/batuhan-demirci/fifa21_dataset/master/data/tbl_player_skill.csv
!wget -q https://raw.githubusercontent.com/batuhan-demirci/fifa21_dataset/master/data/tbl_team.csv

# Load data
player_df = pd.read_csv("tbl_player.csv")
skill_df = pd.read_csv("tbl_player_skill.csv")
team_df = pd.read_csv("tbl_team.csv")

# Extract subsets
player_df = player_df[["int_player_id", "str_player_name", "str_positions", "int_overall_rating", "int_team_id"]]
skill_df = skill_df[["int_player_id", "int_long_passing", "int_ball_control", "int_dribbling"]]
team_df = team_df[["int_team_id", "str_team_name", "int_overall"]]

# Merge data
player_df = player_df.merge(skill_df, on='int_player_id')
fifa_df = player_df.merge(team_df, on='int_team_id')

# Sort dataframe
fifa_df = fifa_df.sort_values(by="int_overall_rating", ascending=False)
print("Players: ", fifa_df.shape[0])
fifa_df.head()

Players:  18767


Unnamed: 0,int_player_id,str_player_name,str_positions,int_overall_rating,int_team_id,int_long_passing,int_ball_control,int_dribbling,str_team_name,int_overall
0,1,Lionel Andrés Messi Cuccittini,"RW, ST, CF",93,5.0,91,96,96,FC Barcelona,84
33,2,Cristiano Ronaldo dos Santos Aveiro,"ST, LW",92,6.0,77,92,88,Juventus,83
57,3,Jan Oblak,GK,91,8.0,40,30,12,Atlético Madrid,83
121,5,Neymar da Silva Santos Júnior,"LW, CAM",91,7.0,81,95,95,Paris Saint-Germain,83
89,4,Kevin De Bruyne,"CAM, CM",91,2.0,93,92,88,Manchester City,85
