data_for_plot(dataframe, group_by, select) function takes as an input a datarame of the following format:
Year | AIS | SJR | B | C |
---|---|---|---|---|
2015 | 15044 | 29876 | 1708 | 200 |
2016 | 16234 | 31051 | 1200 | 300 |
2017 | 18001 | 35015 | 998 | 777 |
And based on specified parameters returns reorganized dataframe, e.g. for data_for_plot(dataframe, "Year", ["B", "C"]) we get:
Year | Values | Type |
---|---|---|
2015 | 1708 | B |
2015 | 200 | C |
2016 | 1200 | B |
2016 | 300 | C |
2017 | 998 | B |
2017 | 777 | C |
Which is quite useful if we want to further visualize the data using ggplot2.
process_docx_to_txt(directory) function processes all .docx files in given directory to .txt files.
onehot(dataframe, labels_colname) function encodes a dataframe containting a column with exactly one label per one row to onehot. Returns only the onehot encoded dataframe without any data from the original one. See:
book | label |
---|---|
Normal People | novel |
Outline | novel |
Inventing the Future | politics |
➜
novel | politics |
---|---|
1 | 0 |
1 | 0 |
0 | 1 |