# Import predefined or your own datasets 

**Authors**: Andreas Kruff, Johann Schaible, Marcos Oliveira

**Version**: 20.04.2020

**Description**: The Class "Data" allows you to use predefined data sets within this toolbox. You can also import your own data sets to work with them.

For more information about the methods that are explained in this tutorial you can check out the online documentation of this toolbox here:

https://gesiscss.github.io/face2face/

## Table of Contents
#### [Import predefined datasets](#predefined)
#### [Import your own datasets](#own)
#### [Create Data object with dataframes](#df)
#### [Replace String attributes](#replace)

# Import predefined datasets 
<a name="predefined"></a>

At first, we have to import the toolbox with its functions.

In [1]:
import face2face as f2f

If you just want to use the predefined datasets you can import a synthetic data set provided inside the toolbox.

In [2]:
df = f2f.Data("Synthetic")
df.interaction.head()

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,20,7,182
1,40,7,182
2,40,14,15
3,40,68,92
4,40,7,182


In [3]:
df.metadata.head()

Unnamed: 0_level_0,ID,type
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,1
1,1,0
2,2,0
3,3,1
4,4,0


If you use a data set which does not contain metadata an error will occur, when you try to access it like this, but you can still use this object to analyze the tij data set.

## Import your own data sets 
<a name="own"></a>

If you import your own data sets please make sure that the column with the IDs in the data set will be named "ID".

To show you how you can import your own data sets we use the "Synthetic" data set as an example. If your data set has no header for the column names you have to create a list with the column names and use it for the input parameter meta_attr_list. As you can see below we created this list for the column names fitting to the example data set "Synthetic". This list will be used as the input parameter "meta_attr_list". For the import of this data set you have to check the data sets location, in this case, the data set lies inside the face2face directory folder "data" of the repository. To set the right separator you have to check the data sets and take a look at how the columns are separated. 

In [5]:
column_name_list = ["ID","type"]

In [8]:
df = f2f.Data(path_tij= "../face2face/data/Synthetic_Data/synthetic_tij.dat", path_meta="../face2face/data/Synthetic_Data/synthetic_metadata.dat", separator_tij=",", separator_meta=",", meta_attr_list=column_name_list)
df.metadata.head()

Unnamed: 0_level_0,ID,type
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,1
1,1,0
2,2,0
3,3,1
4,4,0


If your dataset already contains a header you can use the header parameter instead of the attr_meta_list parameter. 
If your dataset contains a header for the metadata use "meta" as input for the header parameter.

In [9]:
df = f2f.Data(path_tij= "../face2face/data/Synthetic_Data/synthetic_tij.dat", path_meta="../face2face/data/Synthetic_Data/synthetic_metadata.dat", separator_tij=",", separator_meta=",", header="meta")
df.metadata.head()

Unnamed: 0_level_0,0,1
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1,0
1,2,0
2,3,1
3,4,0
4,5,1


In [10]:
df.interaction.head()

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,20,7,182
1,40,7,182
2,40,14,15
3,40,68,92
4,60,7,182


If your tij data contains a header already and your metadata does not you can use the attr_meta_list. In this case, the input parameter for the header would be "tij".

In [11]:
df = f2f.Data(path_tij= "../face2face/data/Synthetic_Data/synthetic_tij.dat", path_meta="../face2face/data/Synthetic_Data/synthetic_metadata.dat", separator_tij=",", separator_meta=",", header="tij", meta_attr_list=column_name_list)
df.metadata.head()

Unnamed: 0_level_0,ID,type
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,1
1,1,0
2,2,0
3,3,1
4,4,0


In [12]:
df.interaction.head()

Unnamed: 0_level_0,20,7,182
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,40,7,182
1,40,14,15
2,40,68,92
3,60,7,182
4,80,7,182


If both datasets already got a header you can use the input parameter "all".

In [13]:
df = f2f.Data(path_tij= "../face2face/data/Synthetic_Data/synthetic_tij.dat", path_meta="../face2face/data/Synthetic_Data/synthetic_metadata.dat", separator_tij=",", separator_meta=",", header="all")
df.metadata.head()

Unnamed: 0_level_0,0,1
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1,0
1,2,0
2,3,1
3,4,0
4,5,1


In [14]:
df.interaction.head()

Unnamed: 0_level_0,20,7,182
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,40,7,182
1,40,14,15
2,40,68,92
3,60,7,182
4,80,7,182


## Create Data object with dataframes
<a name="df"></a>

If you want to create and work with your own dataframes or you want to use the dataframes you got from the create_network method you can use the meta_df or the tij_df parameter for the input. The mentioned create_network method will be described in another tutorial, but we want to already mention that every function of this method has a list of dataframes as its second output parameter, so that you can this dataframes to create a Data object.

To test it we can reuse the Data object and the containing dataframes that we created earlier. Keep in mind that the parameter for the dataframe import already assume dataframes with headers.

In [15]:
df = f2f.Data(path_tij= "../face2face/data/Synthetic_Data/synthetic_tij.dat", path_meta="../face2face/data/Synthetic_Data/synthetic_metadata.dat", separator_tij=",", separator_meta=",", meta_attr_list=column_name_list)
df_interaction = df.interaction.head()
df_meta = df.metadata.head()

As you can see above we imported the "Synthetic" data set and saved the two dataframes or to be precise the first five rows of the dataframes as a dataframe. We could have also directly use "df.interaction.head()" and "df.metadata.head()" for the input parameter. 

In [17]:
df_new = f2f.Data(meta_df=df_meta, tij_df=df_interaction)

In [19]:
df_new.interaction

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,20,7,182
1,40,7,182
2,40,14,15
3,40,68,92
4,60,7,182


In [20]:
df_new.metadata

Unnamed: 0_level_0,ID,type
Index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,1
1,1,0
2,2,0
3,3,1
4,4,0


If you don't have metadata you can just import the tij dataframe.

In [21]:
df_new = f2f.Data(tij_df=df_interaction)

In [22]:
df_new.interaction

Unnamed: 0_level_0,Time,i,j
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,20,7,182
1,40,7,182
2,40,14,15
3,40,68,92
4,60,7,182


## Replace String attributes 
<a name="replace"></a>

For a function in another tutorial you need to transform string attribute values into float values. For this you can use the Class function "replace_str_attr_to_float". For every column it will replace the same strings with the same float values.