# Import predefined or your own datasets 

**Author**: Andreas Kruff

**Version**: 20.04.2020

**Description**: The Class "Data" allows you to use predefined data sets within the package. You can also import your own data sets to work with them.

## Table of Contents
#### [Import predefined datasets](#predefined)
#### [Import your own datasets](#own)
#### [Replace String attributes](#replace)

# Import predefined datasets 
<a name="predefined"></a>

The cell below can be ignored, after being executed once. The path has to be set to the directory above to get access to the data and the functions of this libary.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

At first we have to import the Class with its functions.

In [2]:
from face2face.imports.load_all_data import Data

If you just want to use the predefined datasets you can just one of the following names of the datasets: "WS16", "SFHH", ...
You have to keep in mind, that not every predefined set contains metadata, like the "WS16" data set.

In [4]:
df = Data("test")
print(df.interaction.head())

       Time  i  j
Index            
0        20  0  1
1        40  1  2
2        40  1  3
3        40  2  3
4        60  4  6


In [3]:
df_ws16 = Data("WS16")
print(df_ws16.interaction.head())
print(df_ws16.metadata.head())

             Time    i    j
Index                      
0      1480486100  125  130
1      1480486100    7  130
2      1480486100    9  110
3      1480486120    9  130
4      1480486160  125  130
       ID     Age  Sex Country Language  Education  Academic Background    Role Previous participation
Index                                                                                                 
0       0 0.00000    F      C1       L1        nan                  nan 4.00000                     No
1       1 1.00000    F     NaN      NaN    4.00000              4.00000 2.00000                     No
2       2 0.00000    F   Other       L2    1.00000              1.00000 2.00000                     No
3       3     nan  NaN     NaN      NaN        nan                  nan     nan                    NaN
4       4 1.00000    M   Other       L1    4.00000              4.00000 2.00000                    Yes


If you print the df_sfhh.metadata an error will occur because the object does not contain metadata, but you can still use this object to analyze the tij data set.

In [5]:
df_sfhh = Data("SFHH09")
print(df_sfhh.interaction.head())
#print(df_sfhh.metadata)

        Time     i     j
Index                   
0      32520  1467  1591
1      32560  1513  1591
2      32700  1591  1467
3      32720  1591  1467
4      32740  1591  1467


## Import your own data sets 
<a name="own"></a>

If you import your own data sets please make sure that the column with the ID's in the data set will be named "ID".

In [13]:
column_names= ["ID", "Age range", "Gender", "Country", "Language", "Academic seniority","Academic Background","Role", "Part. previous edition"]
df_ws16 = Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", meta_attr_list=column_names)
print(df_ws16.metadata.head())

       ID  Age range Gender Country Language  Academic seniority  Academic Background    Role Part. previous edition
Index                                                                                                               
0       0    0.00000      F      C1       L1                 nan                  nan 4.00000                     No
1       1    1.00000      F     NaN      NaN             4.00000              4.00000 2.00000                     No
2       2    0.00000      F   Other       L2             1.00000              1.00000 2.00000                     No
3       3        nan    NaN     NaN      NaN                 nan                  nan     nan                    NaN
4       4    1.00000      M   Other       L1             4.00000              4.00000 2.00000                    Yes


If your dataset contains the same metadata attributes in the same order like the predfined datasets you can use the default settings for the column names if the function does not get meta_attr_list as an input parameter.

In [12]:
df_ws16 = Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", meta_attr_list=None)
print(df_ws16.metadata.head())

       ID     Age  Sex Country Language  Education  Academic Background    Role Previous participation
Index                                                                                                 
0       0 0.00000    F      C1       L1        nan                  nan 4.00000                     No
1       1 1.00000    F     NaN      NaN    4.00000              4.00000 2.00000                     No
2       2 0.00000    F   Other       L2    1.00000              1.00000 2.00000                     No
3       3     nan  NaN     NaN      NaN        nan                  nan     nan                    NaN
4       4 1.00000    M   Other       L1    4.00000              4.00000 2.00000                    Yes


If your dataset already contains a header you can use the header parameter instead of the attr_meta_list. 
If your dataset contains a header for the metadata use "meta" as input for the header parameter.

In [8]:
df_ws16 = Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="meta")
print(df_ws16.metadata.head())
print(df_ws16.interaction.head())

       0     0.1    F     C1   L1      NA    NA.1       4   No
Index                                                         
0      1 1.00000    F    NaN  NaN 4.00000 4.00000 2.00000   No
1      2 0.00000    F  Other   L2 1.00000 1.00000 2.00000   No
2      3     nan  NaN    NaN  NaN     nan     nan     nan  NaN
3      4 1.00000    M  Other   L1 4.00000 4.00000 2.00000  Yes
4      5 2.00000    M     C1   L1 5.00000 4.00000 3.00000   No
             Time    i    j
Index                      
0      1480486100  125  130
1      1480486100    7  130
2      1480486100    9  110
3      1480486120    9  130
4      1480486160  125  130


If your tij-data already contains a header already and your metadata does not you can either use the default metadata columns or you can use the attr_meta_list. The input parameter for header in this case is "tij"

In [9]:
df_ws16 = Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="tij")
print(df_ws16.metadata.head())
print(df_ws16.interaction.head())

       ID     Age  Sex Country Language  Education  Academic Background    Role Previous participation
Index                                                                                                 
0       0 0.00000    F      C1       L1        nan                  nan 4.00000                     No
1       1 1.00000    F     NaN      NaN    4.00000              4.00000 2.00000                     No
2       2 0.00000    F   Other       L2    1.00000              1.00000 2.00000                     No
3       3     nan  NaN     NaN      NaN        nan                  nan     nan                    NaN
4       4 1.00000    M   Other       L1    4.00000              4.00000 2.00000                    Yes
       1480486100  125  130
Index                      
0      1480486100    7  130
1      1480486100    9  110
2      1480486120    9  130
3      1480486160  125  130
4      1480486180    9   21


If both datasets already got a header you can use the input parameter "all".

In [11]:
df_ws16 = Data(path_tij= "../data/WS16/tij_WS16.dat", path_meta="../data/WS16/metadata_WS16.dat", separator_tij="\t", separator_meta="\t", header="all")
print(df_ws16.metadata.head())
print(df_ws16.interaction.head())

       0     0.1    F     C1   L1      NA    NA.1       4   No
Index                                                         
0      1 1.00000    F    NaN  NaN 4.00000 4.00000 2.00000   No
1      2 0.00000    F  Other   L2 1.00000 1.00000 2.00000   No
2      3     nan  NaN    NaN  NaN     nan     nan     nan  NaN
3      4 1.00000    M  Other   L1 4.00000 4.00000 2.00000  Yes
4      5 2.00000    M     C1   L1 5.00000 4.00000 3.00000   No
       1480486100  125  130
Index                      
0      1480486100    7  130
1      1480486100    9  110
2      1480486120    9  130
3      1480486160  125  130
4      1480486180    9   21


## Replace String attributes 
<a name="replace"></a>

For a function in another tutorial you need to transform string attribute values into float values. For this you can use the Class function "replace_str_attr_to_float". For every column it will replace the same strings with the same float values.

In [15]:
df_ws16 = Data("WS16")
test = df_ws16.replace_str_attr_to_float()
print(test.metadata.head())

       ID     Age     Sex  Country  Language  Education  Academic Background    Role  Previous participation
Index                                                                                                       
0       0 0.00000 1.00000  1.00000   0.00000    2.00000              2.00000 4.00000                 0.00000
1       1 1.00000 1.00000  2.00000   2.00000    4.00000              4.00000 2.00000                 0.00000
2       2 0.00000 1.00000  1.00000   3.00000    1.00000              1.00000 2.00000                 0.00000
3       3 2.00000 2.00000  2.00000   2.00000    2.00000              2.00000 2.00000                 2.00000
4       4 1.00000 0.00000  1.00000   0.00000    4.00000              4.00000 2.00000                 1.00000
