# Creating a Network

We will use the data we have collected in yesterday's survey to create networks and later on visualize them.

In this notebook we will  read in the survey results from a '.csv' file and then transform it into a network that we then can load in to Gephi for visualisation.

To make this notebook run, you will need to install another package called 'networkx'.

Do you still remember how you can install a package? If not, just take a look into the Installation Document to get reminded again.

In [3]:
import pandas as pd # pandas is a package that makes it easier to deal with dataframes
import matplotlib.pyplot as plt # matplotlib is python's standard package for data plotting
import seaborn as sns # seaborn will make your plots prettier
from scraping_functions import * # we have prepared some functions in a separate file that will make your life easier which we are importing here
%matplotlib inline 
import networkx as nx
import itertools
from create_network import *

In [5]:
# read in the data
df = pd.read_csv('test.csv')
# rename the columns to make it easier to reference them
df.columns = ['time','nickname','country','language','living_in_dk','age','bachelor','gender','hobbies','operating','food','screen','expectations','sds','wake_up']

In [6]:
# convert everything to lowercase
df = df.applymap(lambda s:s.lower()if type(s)==str else s)

In [7]:
# currently the hobbies are listed in a string variable - this code converts them to a list
hobbies_list = df['hobbies'].str.split(',',n=2,expand=False)
df = df.drop('hobbies',axis=1)
df = pd.concat([df,hobbies_list],axis=1)

In [8]:
# print the first few rows to see how the data looks like
df.head()

Unnamed: 0,time,nickname,country,language,living_in_dk,age,bachelor,gender,operating,food,screen,expectations,sds,wake_up,hobbies
0,18.08.2022 09:14:58,test_user1,denmark,danish,>20 years,23,political science,male,windows,pizza,4 - 6 hours,fun :d,balbjakdfjlkdsjfads,3,"[cycling, reading, cooking]"
1,18.08.2022 09:15:53,test_user2,denmark,danish,>20 years,25,business,female,mac,pasta,6 - 8 hours,meeting lots of new people,wanted to learn coding,1,"[cycling, horse riding, swimming]"
2,18.08.2022 09:18:09,test_user3,germany,german,5-10 years,27,engineering and computer science,male,windows,pizza,8 - 10 hours,cool people,just for fun,4,"[cooking, reading, singing]"
3,18.08.2022 09:19:02,test_user4,denmark,danish,> 1 year,23,engineering and computer science,non-binary,windows,sushi,4 - 6 hours,nothing,sounds cool,4,"[swimming, running, cycling]"
4,18.08.2022 09:19:55,test_user5,argentina,spanish,< 1 year,29,humanities,male,windows,pasta,< 2 hours,sunshine,couldnt think of sth better,5,"[running, coding, reading]"


Now we will create a network from this dataframe. We will be doing this by saying that two people are connected if they share an attribute within a specific category.

Using the function below you can easily create different types of networks. Just type in the name of the column that you want to use to establish a link between two individuals.

In [201]:
# create the network
G = create_network(df, #put in your word)

In [203]:

# now we export the network so you can load it into Gephi and visualise it
nx.write_gexf(G, 'graph_gephi.gexf')