Skip to content

Commit

Permalink
pushing 1st sujan graph
Browse files Browse the repository at this point in the history
  • Loading branch information
parthi2929 committed Apr 16, 2024
1 parent ea2f4e1 commit a9349c5
Show file tree
Hide file tree
Showing 2 changed files with 310 additions and 1 deletion.
305 changes: 305 additions & 0 deletions content/Sujan-03.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 5,
"id": "148eefe0-80b8-4790-b6c5-cf159980fbcd",
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "3db54c12-88f9-4f79-b521-28503e893816",
"metadata": {},
"source": [
"# Sujan Graph\n",
"\n",
"### 1. Create latest Graph\n",
"\n",
"From `relationships.csv` file, we create latest graph html file for maximum year. We create this file for purpose that would be clear in next step. \n",
"\n",
"Any one could see this generated html file with interactive graph. This is not efficient. Also voila forbids loading the file.."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "862a2571-7c20-4b94-b44e-0ae2e6784d24",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sujan_latest.html\n",
"Latest graph generated successfully\n"
]
}
],
"source": [
"from pyvis.network import Network\n",
"import pandas as pd\n",
"import networkx as nx\n",
"from ipywidgets import interact, IntSlider, fixed, Layout\n",
"from IPython.display import IFrame, display, Javascript\n",
"\n",
"# Function to read the CSV and return a DataFrame\n",
"def read_csv_to_df(csv_file_path):\n",
" df = pd.read_csv(csv_file_path)\n",
" return df\n",
"\n",
"def filter_graph_by_year(df, year):\n",
" # Filter DataFrame for relationships up to the selected year\n",
" df_filtered = df[df['year'] <= year]\n",
"\n",
" # Recreate the graph based on the filtered DataFrame\n",
" G_filtered = nx.Graph()\n",
"\n",
" # Calculate weights for persons and companies based on their frequency up to the selected year\n",
" person_weights = df_filtered['person'].value_counts().to_dict()\n",
" company_weights = df_filtered['company'].value_counts().to_dict()\n",
"\n",
" # Add nodes and edges based on the filtered DataFrame\n",
" for _, row in df_filtered.iterrows():\n",
" person_weight = person_weights.get(row['person'], 1)\n",
" company_weight = company_weights.get(row['company'], 1)\n",
"\n",
" G_filtered.add_node(row['person'], type='person', weight=person_weight)\n",
" G_filtered.add_node(row['company'], type='company', weight=company_weight)\n",
" G_filtered.add_edge(row['person'], row['company'], relation=row['year'])\n",
"\n",
" return G_filtered\n",
"\n",
"\n",
"def plot_networkx_graph_by_year_pyvis(df, year):\n",
" G = filter_graph_by_year(df, year)\n",
" \n",
" # Create a Pyvis network graph\n",
" net = Network(notebook=True, height=\"400px\", width=\"100%\", cdn_resources=\"remote\")\n",
" \n",
"\n",
" # Add nodes with initial properties\n",
" for node, attr in G.nodes(data=True):\n",
" color = 'violet' if attr['type'] == 'person' else 'orange'\n",
" value = attr.get('weight', 1)\n",
" # Check for specific name to change the color\n",
" if \"sujan\" in node.lower() and attr['type'] == 'person':\n",
" color = '#3d46ff'\n",
" net.add_node(node, title=node, color=color, value=value)\n",
"\n",
"\n",
" # Add edges to the network\n",
" for source, target, edge_attrs in G.edges(data=True):\n",
" net.add_edge(source, target, title=edge_attrs['relation'])\n",
" \n",
" \n",
" # Set options for the visualization\n",
" net.set_options(\"\"\"\n",
" var options = {\n",
" \"physics\": {\n",
" \"enabled\": true,\n",
" \"minVelocity\": 0.75,\n",
" \"repulsion\": {\n",
" \"centralGravity\": 0.1,\n",
" \"springLength\": 200,\n",
" \"springConstant\": 0.05,\n",
" \"nodeDistance\": 300,\n",
" \"damping\": 0.5\n",
" },\n",
" \"forceAtlas2Based\": {\n",
" \"avoidOverlap\": 1,\n",
" \"gravitationalConstant\": -50,\n",
" \"centralGravity\": 0.01,\n",
" \"springConstant\": 0.08,\n",
" \"damping\": 0.4\n",
" },\n",
" \"maxVelocity\": 50,\n",
" \"solver\": \"forceAtlas2Based\",\n",
" \"timestep\": 0.35\n",
" }\n",
" }\n",
" \"\"\")\n",
" \n",
" # Generate the network graph\n",
" net.show('sujan_latest.html')\n",
" return IFrame('sujan_latest.html', width='100%', height='450px')\n",
"\n",
"# Assuming `read_csv_to_df` is a function you have defined to read the CSV file into a DataFrame.\n",
"df = read_csv_to_df('relationships.csv')\n",
"year_max = df['year'].max()\n",
"plot_networkx_graph_by_year_pyvis(df, year_max)\n",
"print(\"Latest graph generated successfully\")"
]
},
{
"cell_type": "markdown",
"id": "78c8fc68-d65c-4f7e-8a3f-5e74f2d82d17",
"metadata": {},
"source": [
"### 2. Extract Positions\n",
"\n",
"The file created earlier using pyvis has best positioned nodes. We extract this for our custom node positions in next step, where for every year we want to see the graph. If we do not hard code best node position, for every year, pyvis decides its own positions, which make tracking the yearly changes very difficult visually. \n",
"\n",
"To achieve this, we use hints from [here](https://github.com/WestHealth/pyvis/issues/157#issuecomment-1288566121) to extract and dump the positions, and read that file directly in our next step. "
]
},
{
"cell_type": "markdown",
"id": "79259a2d-64d2-4b6d-8844-b72fcb281f12",
"metadata": {},
"source": [
"```js\n",
"function download(content, fileName, contentType) {\n",
" var a = document.createElement(\"a\");\n",
" var file = new Blob([content], {type: contentType});\n",
" a.href = URL.createObjectURL(file);\n",
" a.download = fileName;\n",
" a.click();\n",
"}\n",
"network.storePositions();\n",
"download(JSON.stringify(data.nodes.get()), 'positions.txt', 'text/plain');\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "63097021-88d1-446e-ab68-7499f8d024c6",
"metadata": {},
"source": [
"### Create Yearly Evolution Graph\n",
"\n",
"Now that we have best positions from file for nodes, for every year, for matching nodes, we create graph at those positions only as below. Sometimes, output does not show up here, but it will in voila server. \n",
"\n",
"The change from previous year and shown year, would be displayed in yellow edges, so it would be clear what changed every year. This is one important visual hint to get to know what delta happened in an year. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f3a056bf-692f-4357-b6d5-28fd758c2e8e",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a3f2dc6d18854248829871c9cf7097f9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=1994, description='Year:', layout=Layout(width='100%'), max=2023, min=19…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import networkx as nx\n",
"from pyvis.network import Network\n",
"from ipywidgets import interact, IntSlider, Layout\n",
"from IPython.display import display, HTML\n",
"import json\n",
"\n",
"def read_positions(file_path):\n",
" with open(file_path, \"r\") as file:\n",
" positions_data = json.load(file)\n",
" positions = {item['id']: (item['x'], item['y']) for item in positions_data}\n",
" return positions\n",
"\n",
"def prepare_full_graph(df):\n",
" G = nx.Graph()\n",
" person_weights = df['person'].value_counts().to_dict()\n",
" company_weights = df['company'].value_counts().to_dict()\n",
"\n",
" for _, row in df.iterrows():\n",
" person_weight = person_weights.get(row['person'], 1)\n",
" company_weight = company_weights.get(row['company'], 1)\n",
" G.add_node(row['person'], type='person', year=row['year'], label=row['person'], weight=person_weight)\n",
" G.add_node(row['company'], type='company', year=row['year'], label=row['company'], weight=company_weight)\n",
" G.add_edge(row['person'], row['company'], year=row['year'])\n",
"\n",
" return G\n",
"\n",
"def plot_networkx_graph_by_year_pyvis_2(positions, G, year, prev_year_edges=set()):\n",
" net = Network(notebook=True, height=\"450px\", width=\"100%\", bgcolor=\"#222222\", font_color=\"white\", cdn_resources='remote')\n",
" current_edges = set()\n",
"\n",
" for node, attr in G.nodes(data=True):\n",
" if attr['year'] <= year:\n",
" color = 'violet' if attr['type'] == 'person' else 'orange'\n",
" value = attr.get('weight', 1)\n",
" if \"sujan\" in node.lower() and attr['type'] == 'person':\n",
" color = '#87dfff'\n",
" node_pos = positions.get(node, (0, 0))\n",
" net.add_node(node, label=node, x=node_pos[0], y=node_pos[1], color=color, value=value, fixed=True)\n",
"\n",
" for source, target, edge_attrs in G.edges(data=True):\n",
" if edge_attrs['year'] <= year:\n",
" current_edges.add((source, target))\n",
" if (source, target) not in prev_year_edges and edge_attrs['year'] > year - 1:\n",
" edge_color = 'yellow'\n",
" width = 5 # 2 times thicker\n",
" else:\n",
" edge_color = '#87dfff'\n",
" width = 1\n",
" net.add_edge(source, target, color=edge_color, width=width)\n",
"\n",
" net.show(\"tmp3.html\")\n",
" display(HTML(\"tmp3.html\"))\n",
" return current_edges\n",
"\n",
"def interactive_networkx_graph_pyvis(G, positions):\n",
" year_min, year_max = min(attr['year'] for _, attr in G.nodes(data=True)), max(attr['year'] for _, attr in G.nodes(data=True))\n",
" prev_year_edges = set()\n",
"\n",
" def update_graph(year):\n",
" nonlocal prev_year_edges\n",
" prev_year_edges = plot_networkx_graph_by_year_pyvis_2(positions, G, year, prev_year_edges)\n",
"\n",
" interact(update_graph, year=IntSlider(value=year_min, min=year_min, max=year_max, step=1, description='Year:', layout=Layout(width='100%')))\n",
"\n",
"# Load data and create graph\n",
"df = pd.read_csv('relationships.csv') # Replace with your actual CSV data path\n",
"positions = read_positions('positions.txt') # Use the path where the positions file is saved\n",
"G = prepare_full_graph(df)\n",
"interactive_networkx_graph_pyvis(G, positions)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3cd1187-cb22-4eb2-ac70-3b69a9bc0c4b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:.conda-dataviz]",
"language": "python",
"name": "conda-env-.conda-dataviz-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
6 changes: 5 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ channels:
dependencies:
- xeus-python
- pandas <2
- bqplot
- ipywidgets
- networkx
- pandas
- plotly
- pyvis

0 comments on commit a9349c5

Please sign in to comment.