# Import predefined or your own datasets 

**Authors**: Andreas Kruff, Johann Schaible, Marcos Oliveira

**Version**: 20.04.2020

**Description**: The Class "Data" allows you to use predefined data sets within this toolbox. You can also import your own data sets to work with them.

The data sets that are used in this tutorial are part of the following paper:

**GÃ©nois, Mathieu & Zens, Maria & Lechner, Clemens & Rammstedt, Beatrice & Strohmaier, Markus. (2019). Building connections: How scientists meet each other during a conference.**

The data sets are available here: 

https://zenodo.org/record/2531537#.X0OObcgzaUl

For more information about the methods that are explained in this tutorial you can check out the online documentation of this toolbox here:

https://gesiscss.github.io/face2face/

## Table of Contents
#### [Import predefined datasets](#predefined)
#### [Import your own datasets](#own)
#### [Create Data object with dataframes](#df)
#### [Replace String attributes](#replace)

# Import predefined datasets 
<a name="predefined"></a>

At first, we have to import the toolbox with its functions.

In [1]:
import face2face as f2f

If you just want to use the predefined datasets you can just choose one of the following names of the datasets: "WS16", "ICCSS17", "test" and "Synthetic".
You have to keep in mind, that not every data set has to contain metadata, like the predefined ones in this toolbox.

In [2]:
df = f2f.Data("test")
df.interaction.head()

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,20,0,1
1,40,1,2
2,40,1,3
3,40,2,3
4,60,4,6


In [3]:
df_ws16 = f2f.Data("WS16")
df_ws16.interaction.head()

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,125,130
1,1480486100,7,130
2,1480486100,9,110
3,1480486120,9,130
4,1480486160,125,130


In [5]:
df_ws16.metadata.head()

Unnamed: 0_level_0,ID,Age,Sex,Country,Language,Education,Academic Background,Role,Previous participation
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0,0.0,F,C1,L1,,,4.0,No
1,1,1.0,F,,,4.0,4.0,2.0,No
2,2,0.0,F,Other,L2,1.0,1.0,2.0,No
3,3,,,,,,,,
4,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes


If you try to call the .metadata object from a data set without metadata an error will occur, but you can still use this object to analyze the tij data set.

## Import your own data sets 
<a name="own"></a>

If you import your own data sets please make sure that the column with the IDs in the data set will be named "ID".

To show you how you can import your own data sets we use the "WS16" data set as an example. If your data set has no header for the column names you have to create a list with the column names and use it for the input parameter meta_attr_list. As you can see below we created this list for the column names fitting to the example data set "WS16" based on the provided information of the <a href="../data/WS16/readme_WS16">readme_ws16</a>. This list will be used as the input parameter "meta_attr_list". For the import of this data set you have to check the data sets location, in this case, the data set lies inside the top-level directory folder "data" of the repository. To set the right separator you have to check the data sets and take a look at how the columns are separated. 

In [7]:
column_name_list = ["ID","Age","Sex","Country","Language","Education","Academic Background","Role","Previous participation"]

In [8]:
df_ws16 = f2f.Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", meta_attr_list=column_name_list)
df_ws16.metadata.head()

Unnamed: 0_level_0,ID,Age,Sex,Country,Language,Education,Academic Background,Role,Previous participation
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0,0.0,F,C1,L1,,,4.0,No
1,1,1.0,F,,,4.0,4.0,2.0,No
2,2,0.0,F,Other,L2,1.0,1.0,2.0,No
3,3,,,,,,,,
4,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes


If your dataset already contains a header you can use the header parameter instead of the attr_meta_list parameter. 
If your dataset contains a header for the metadata use "meta" as input for the header parameter.

In [9]:
df_ws16 = f2f.Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="meta")
df_ws16.metadata.head()

Unnamed: 0_level_0,0,0.1,F,C1,L1,NA,NA.1,4,No
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,1,1.0,F,,,4.0,4.0,2.0,No
1,2,0.0,F,Other,L2,1.0,1.0,2.0,No
2,3,,,,,,,,
3,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes
4,5,2.0,M,C1,L1,5.0,4.0,3.0,No


In [10]:
df_ws16.interaction.head()

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,125,130
1,1480486100,7,130
2,1480486100,9,110
3,1480486120,9,130
4,1480486160,125,130


If your tij-data contains a header already and your metadata does not you can use the attr_meta_list. In this case, the input parameter for the header would be "tij".

In [11]:
df_ws16 = f2f.Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="tij", meta_attr_list=column_name_list)
df_ws16.metadata.head()

Unnamed: 0_level_0,ID,Age,Sex,Country,Language,Education,Academic Background,Role,Previous participation
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0,0.0,F,C1,L1,,,4.0,No
1,1,1.0,F,,,4.0,4.0,2.0,No
2,2,0.0,F,Other,L2,1.0,1.0,2.0,No
3,3,,,,,,,,
4,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes


In [12]:
df_ws16.interaction.head()

Unnamed: 0_level_0,1480486100,125,130
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,7,130
1,1480486100,9,110
2,1480486120,9,130
3,1480486160,125,130
4,1480486180,9,21


If both datasets already got a header you can use the input parameter "all".

In [13]:
df_ws16 = f2f.Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="all")
df_ws16.metadata.head()

Unnamed: 0_level_0,0,0.1,F,C1,L1,NA,NA.1,4,No
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,1,1.0,F,,,4.0,4.0,2.0,No
1,2,0.0,F,Other,L2,1.0,1.0,2.0,No
2,3,,,,,,,,
3,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes
4,5,2.0,M,C1,L1,5.0,4.0,3.0,No


In [14]:
df_ws16.interaction.head()

Unnamed: 0_level_0,1480486100,125,130
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,7,130
1,1480486100,9,110
2,1480486120,9,130
3,1480486160,125,130
4,1480486180,9,21


## Create Data object with dataframes
<a name="df"></a>

If you want to create and work with your own dataframes or you want to use the dataframes you got from the create_network method you can use the meta_df or the tij_df parameter for the input. The mentioned create_network method will be described in another tutorial, but we want to already mention that every function of this method has a list of dataframes as its second output parameter, so that you can this dataframes to create a Data object.

To test it we can reuse the Data object and the containing dataframes that we created earlier. Keep in mind that the parameter for the dataframe import already assume dataframes with headers.

In [15]:
df_ws16 = f2f.Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", meta_attr_list=column_name_list)
df_interaction = df_ws16.interaction.head()
df_meta = df_ws16.metadata.head()

As you can see above we imported the "WS16" data set and saved the two dataframes or to be precise the first five rows of the dataframes as a dataframe. We could have also directly use "df_ws16.interaction.head()" and "df_ws16.metadata.head()" for the input parameter. 

In [16]:
df_ws16_new = f2f.Data(meta_df=df_meta, tij_df=df_interaction)

In [17]:
df_ws16_new.interaction

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,125,130
1,1480486100,7,130
2,1480486100,9,110
3,1480486120,9,130
4,1480486160,125,130


In [18]:
df_ws16_new.metadata

Unnamed: 0_level_0,ID,Age,Sex,Country,Language,Education,Academic Background,Role,Previous participation
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0,0.0,F,C1,L1,,,4.0,No
1,1,1.0,F,,,4.0,4.0,2.0,No
2,2,0.0,F,Other,L2,1.0,1.0,2.0,No
3,3,,,,,,,,
4,4,1.0,M,Other,L1,4.0,4.0,2.0,Yes


If you don't have metadata you can just import the tij dataframe.

In [19]:
df_ws16_new = f2f.Data(tij_df=df_interaction)

In [20]:
df_ws16_new.interaction

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1480486100,125,130
1,1480486100,7,130
2,1480486100,9,110
3,1480486120,9,130
4,1480486160,125,130


## Replace String attributes 
<a name="replace"></a>

For a function in another tutorial you need to transform string attribute values into float values. For this you can use the Class function "replace_str_attr_to_float". For every column it will replace the same strings with the same float values.

In [15]:
df_ws16 = f2f.Data("WS16")
test = df_ws16.replace_str_attr_to_float()
print(test.metadata.head())

       ID     Age     Sex  Country  Language  Education  Academic Background    Role  Previous participation
Index                                                                                                       
0       0 0.00000 1.00000  1.00000   0.00000    2.00000              2.00000 4.00000                 0.00000
1       1 1.00000 1.00000  2.00000   2.00000    4.00000              4.00000 2.00000                 0.00000
2       2 0.00000 1.00000  1.00000   3.00000    1.00000              1.00000 2.00000                 0.00000
3       3 2.00000 2.00000  2.00000   2.00000    2.00000              2.00000 2.00000                 2.00000
4       4 1.00000 0.00000  1.00000   0.00000    4.00000              4.00000 2.00000                 1.00000
