## Generate one CSV file from the two original pickle files
This notebook was created to convert  two pickle files into one single CSV file. While pickle files are commonly used for serializing Python objects, the CSV file is accessible and user friendly for further data analysis and processing.

The resulting CSV file contains approxiamtely 86000 forum posts, including the data and time of each post, the associated company, and a unique post ID.

In [None]:
# loading google drive
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
import pickle
import pandas as pd
import yaml
with open ("/content/drive/MyDrive/github_projects/fine_tuning_ai_for_sentiments/config/config.yaml", "r") as f:
  config = yaml.safe_load(f)

In [None]:
# loading the first pickle file
with open (config["project_path"]+config["data_raw_dir"]+config["pickle_file_1"], "rb") as f:
  loaded_object = pickle.load(f)
# tranforming the pickle file into a dataframe with each row containing one forum post
df_forum_posts_part_1 = pd.DataFrame(loaded_object).transpose()
# labelling the column with the IDs "ID"
df_forum_posts_part_1 = df_forum_posts_part_1.reset_index(names="ID")
df_forum_posts_part_1

Unnamed: 0,ID,text,datetime,company
0,61,"Wenn vergangene Beiträge liest, weisst was ic...",2021-05-31 22:33:17,1_und_1_Drillisch
1,86,"Viel wichtiger wäre zu erfahren, warum bei de...",2021-04-13 21:11:01,1_und_1_Drillisch
2,107,Du immer mit der Übernahme!,2021-02-11 22:34:51,1_und_1_Drillisch
3,125,Erfreuliche Kursentwicklung. Telekomwerte ...,2021-02-08 11:59:17,1_und_1_Drillisch
4,128,Jemand eine Erklärung warum gerade heute a...,2021-01-27 11:37:50,1_und_1_Drillisch
...,...,...,...,...
74673,1440349,"naja aber was wäre : stoppkurse bei 7,75€ (al...",2008-07-02 12:29:21,Wirecard
74674,1440354,"hübsches sümmchen, was da investiert wurde...",2008-07-02 11:56:10,Wirecard
74675,1440381,SES macht bewusst den Kurs kaput. Das ist ...,2008-07-01 16:00:48,Wirecard
74676,1440414,9 EUR ...... wir kommen,2008-07-01 09:09:33,Wirecard


In [None]:
# loading the second pickle file by repeating the above shown process
with open (config["project_path"]+config["data_raw_dir"]+config["pickle_file_2"], "rb") as f:
  loaded_object = pickle.load(f)
# tranforming the pickle file into a dataframe with each row containing one forum post
df_forum_posts_part_2 = pd.DataFrame(loaded_object).transpose()
# labelling the column with the IDs "ID"
df_forum_posts_part_2 = df_forum_posts_part_2.reset_index(names="ID")
df_forum_posts_part_2

Unnamed: 0,ID,text,datetime,company
0,43,Man spürt wie jemand den Kurs gierig künst...,2021-07-07 13:05:40,1_und_1_Drillisch
1,214,Wovon sollte das bezahlt werden? Die Divid...,2020-08-29 08:20:13,1_und_1_Drillisch
2,330,">>Wer genau hinschaut, erkennt die Sinnlosigk...",2019-06-19 18:04:05,1_und_1_Drillisch
3,429,Der Markt dürfte für Drillisch enger durch di...,2018-10-28 20:27:34,1_und_1_Drillisch
4,607,27.10.15 13:12 aktiencheck.de Maintal (www.a...,2015-11-02 10:04:41,1_und_1_Drillisch
...,...,...,...,...
11500,1438424,"4,86€ scheint nichts dran zu sein toller B...",2008-07-18 16:57:21,Wirecard
11501,1438885,Ich bin kein Freund von Verschwörungstheorien...,2008-07-15 09:51:30,Wirecard
11502,1439226,Du scheinst wohl noch nicht sehr viel Erfahru...,2008-07-06 22:24:06,Wirecard
11503,1439860,sicher nicht..aber die 8-9 Euro wären durc...,2008-06-30 10:23:25,Wirecard


In [None]:
# concatenating the two dataframes
df_data_combined = pd.concat([df_forum_posts_part_1, df_forum_posts_part_2], ignore_index=True)
df_data_combined

Unnamed: 0,ID,text,datetime,company
0,61,"Wenn vergangene Beiträge liest, weisst was ic...",2021-05-31 22:33:17,1_und_1_Drillisch
1,86,"Viel wichtiger wäre zu erfahren, warum bei de...",2021-04-13 21:11:01,1_und_1_Drillisch
2,107,Du immer mit der Übernahme!,2021-02-11 22:34:51,1_und_1_Drillisch
3,125,Erfreuliche Kursentwicklung. Telekomwerte ...,2021-02-08 11:59:17,1_und_1_Drillisch
4,128,Jemand eine Erklärung warum gerade heute a...,2021-01-27 11:37:50,1_und_1_Drillisch
...,...,...,...,...
86178,1438424,"4,86€ scheint nichts dran zu sein toller B...",2008-07-18 16:57:21,Wirecard
86179,1438885,Ich bin kein Freund von Verschwörungstheorien...,2008-07-15 09:51:30,Wirecard
86180,1439226,Du scheinst wohl noch nicht sehr viel Erfahru...,2008-07-06 22:24:06,Wirecard
86181,1439860,sicher nicht..aber die 8-9 Euro wären durc...,2008-06-30 10:23:25,Wirecard


In [None]:
# save the combined data into a CSV file while making sure to not save the index
df_data_combined.to_csv(config["project_path"]+config["data_processed_dir"]+"forum_posts_all_initial.csv", index=False)