**EDA**
- how much traffic per channel
- which post in which channel gets the most reaction/traction 
- what time of the day students are posting the most per day
- distribution of  % links, % memes, % emojis
- frequency of messages from teaching staff vs students
- count of messages per person
- avg length of messages per person
- statistics of mentions within a date range
- which channel is the most useful(most messages)

**Automated function**
- a report at the end of each day, with a few summary statistics for that day
- a daily / weekly extract of all the links/uploads posted in public channels 

**NLP**
- whats the top 5 words used per channel (bi-grams, tri-grams)
- how negative or positive posts were on the lab help channel specifically
- who is the happiest/most sad person
- number of people who asks questions, who are the asking the most

**Network Analysis: (if I have time):**
- identify people who answer most questions

# EDA

## Preparing Data
- load multiple json files into dataframes
    - generate one dataframe per channel

## Assess and Clean Data
- drop columns which doesn't make sense
- drop rows which doesn't make sense
- clear text which is not needed
- change data types if needed
- get normal date from timestamp

In [880]:
# import basic libraries
import os, json
import pandas as pd
import numpy as np
import glob

### Loading Multiple Json Files into DataFrames
Generate one dataframe/channel

### channel_gen

In [881]:
# defining file path
path_to_json = '../data/general/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_gen = pd.concat(dfs, ignore_index=True)
# test
channel_gen.tail(100)

Unnamed: 0,client_msg_id,type,text,user,ts,team,user_team,source_team,user_profile,blocks,...,upload,display_as_bot,reactions,last_read,bot_id,bot_profile,subtype,topic,root,purpose
1061,3d62b711-9c19-4aa8-8874-615579ff48e7,message,yes and then using the app,U01S79YDELR,1.616487e+09,T01RBRV5F7H,T01RBRV5F7H,T01RBRV5F7H,"{'avatar_hash': '606def897de6', 'image_72': 'h...","[{'type': 'rich_text', 'block_id': 'Bie', 'ele...",...,,,,,,,,,,
1062,f790d842-a5e8-44e7-be76-678d69cfb06d,message,"<!channel>,\ncan you access the content now? W...",U01SJKB2MG8,1.616487e+09,T01RBRV5F7H,T01RBRV5F7H,T01RBRV5F7H,"{'avatar_hash': '88fa894fdb53', 'image_72': 'h...","[{'type': 'rich_text', 'block_id': '6p6z', 'el...",...,,,,,,,,,,
1063,67fff521-642a-4209-82be-42a6b25e91d7,message,mine not signing in,U01S0P26NKD,1.616487e+09,T01RBRV5F7H,T01RBRV5F7H,T01RBRV5F7H,"{'avatar_hash': 'g7b30129d02b', 'image_72': 'h...","[{'type': 'rich_text', 'block_id': 'UCLUf', 'e...",...,,,,,,,,,,
1064,16c8cc9f-f308-41d9-a642-4120f2fb3041,message,"yes, thanks. I have problems to go the the cla...",U01S79YDELR,1.616487e+09,T01RBRV5F7H,T01RBRV5F7H,T01RBRV5F7H,"{'avatar_hash': '606def897de6', 'image_72': 'h...","[{'type': 'rich_text', 'block_id': 'Voo', 'ele...",...,,,,,,,,,,
1065,9682cdd5-2cf6-4344-9542-9e2711dcf6c1,message,<@U01SJKB2MG8> i cannot join the class,U01S79YDELR,1.616487e+09,T01RBRV5F7H,T01RBRV5F7H,T01RBRV5F7H,"{'avatar_hash': '606def897de6', 'image_72': 'h...","[{'type': 'rich_text', 'block_id': 'upS/I', 'e...",...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1156,,message,<@U01RNEU3SNA> has joined the channel,U01RNEU3SNA,1.616166e+09,,,,,,...,,,,,,,channel_join,,,
1157,,message,<@U01RS9Y6UJH> has joined the channel,U01RS9Y6UJH,1.616167e+09,,,,,,...,,,,,,,channel_join,,,
1158,,message,<@U01SK96QF5E> has joined the channel,U01SK96QF5E,1.616168e+09,,,,,,...,,,,,,,channel_join,,,
1159,,message,<@U01RP2K1606> has joined the channel,U01RP2K1606,1.616172e+09,,,,,,...,,,,,,,channel_join,,,


In [882]:
# save to csv
channel_gen.to_csv(r'../data/general/channel_gen.csv', index = False)

**Assessment Part1**

In [883]:
# assess shape
channel_gen.shape

(1161, 32)

In [884]:
# assess column names
channel_gen.columns

Index(['client_msg_id', 'type', 'text', 'user', 'ts', 'team', 'user_team',
       'source_team', 'user_profile', 'blocks', 'thread_ts', 'parent_user_id',
       'edited', 'reply_count', 'reply_users_count', 'latest_reply',
       'reply_users', 'replies', 'is_locked', 'subscribed', 'attachments',
       'files', 'upload', 'display_as_bot', 'reactions', 'last_read', 'bot_id',
       'bot_profile', 'subtype', 'topic', 'root', 'purpose'],
      dtype='object')

In [885]:
# used this cell to check through each column's values to validate if I need them or not ahead of any other cleaning
channel_gen['type'].value_counts()

message    1161
Name: type, dtype: int64

In [886]:
# check for outliers in the numerical part
channel_gen_clean.describe()

Unnamed: 0,reply_count,reply_users_count,weekday_number
count,134.0,134.0,1138.0
mean,5.343284,2.880597,2.508787
std,6.021396,2.237899,1.601564
min,1.0,1.0,0.0
25%,2.0,2.0,1.0
50%,4.0,2.0,3.0
75%,6.0,3.0,4.0
max,41.0,16.0,6.0


There are no outliers in this data, it seems healthy.

**Summary of Assesment Part1**

**Columns to drop:**
    - type, team, user_team, source_team, latest_reply, last_read, bot_id, bot_profile, display_as_bot, topic, blocks, edited, is_locked, subscribed, upload, display_as_bot, root, purpose, thread_ts, parent_used_id

**Columns to clean & wrangle:**
- subtype: filter out it's values from df, remove the original column
- ts: changing it to datetime, remove miliseconds, get days of the week, months of the year, type of the day, parts of the day
- user_profile: extract real_name in new column, remove the original
- attachments: extract title, text, link in new columns
- files: extract url_private and who shared
- attachments: extract title, text, link in new columns
- reactions: extract user, count, name of the emoji
- text: ?

**Columns to visualise**
- reply_count: how many
- reply_users_count: how many
- reply_users: who
- replies: who and when
- files, attachment, reactions

**Cleaning Part1**

In [887]:
# drop columns
channel_gen.drop(['type', 'team', 'user_team', 'source_team', 
                  'latest_reply', 'last_read', 'bot_id', 
                  'bot_profile', 'display_as_bot', 
                  'topic', 'blocks', 'edited', 'is_locked', 
                  'subscribed', 'upload', 'display_as_bot', 
                  'root', 'purpose', 'thread_ts', 
                  'parent_user_id', 'client_msg_id'], axis=1, inplace=True)

In [888]:
# test
channel_gen.shape

(1161, 13)

**Assessment Part2**

In [889]:
# list of columns, their non-null objects and data type of columns
channel_gen.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1161 entries, 0 to 1160
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   client_msg_id      1018 non-null   object 
 1   text               1161 non-null   object 
 2   user               1161 non-null   object 
 3   ts                 1161 non-null   float64
 4   user_profile       1015 non-null   object 
 5   reply_count        134 non-null    float64
 6   reply_users_count  134 non-null    float64
 7   reply_users        134 non-null    object 
 8   replies            134 non-null    object 
 9   attachments        83 non-null     object 
 10  files              103 non-null    object 
 11  reactions          267 non-null    object 
 12  subtype            30 non-null     object 
dtypes: float64(3), object(10)
memory usage: 118.0+ KB


In [890]:
# check, if there are any nulls and NaN values in our data set
channel_gen.isna().sum()

client_msg_id         143
text                    0
user                    0
ts                      0
user_profile          146
reply_count          1027
reply_users_count    1027
reply_users          1027
replies              1027
attachments          1078
files                1058
reactions             894
subtype              1131
dtype: int64

In [891]:
#to summarise this in one line of code and round the values 
channel_gen.isna().mean().round(4) *100

client_msg_id        12.32
text                  0.00
user                  0.00
ts                    0.00
user_profile         12.58
reply_count          88.46
reply_users_count    88.46
reply_users          88.46
replies              88.46
attachments          92.85
files                91.13
reactions            77.00
subtype              97.42
dtype: float64

Due to the nature of the dataset, the missing values are not because of missing data, but because some posts didn't have replies, attachments, etc. The only important thing is that each value is coming from a user, and it is because I can see there are no missing values at the text or user columns. So I intend to keep everything as it is. I will dive deeper into rows that I don't need: channel join messages for example.

In [892]:
channel_gen['subtype'].value_counts()

channel_join        20
channel_topic        7
channel_purpose      2
thread_broadcast     1
Name: subtype, dtype: int64

In [893]:
# filter out for the rows which has subtype values
channel_gen_clean = channel_gen[(channel_gen.subtype != 'channel_join') & 
                                (channel_gen.subtype != 'channel_join') &
                                (channel_gen.subtype != 'channel_purpose') &
                                (channel_gen.subtype != 'thread_broadcast')]


#test 
channel_gen_clean.shape

(1138, 13)

In [894]:
# drop subtype column
channel_gen_clean.drop('subtype', axis=1, inplace=True) 

#test
channel_gen_clean.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


(1138, 12)

In [895]:
#convert ts to datetime from float
from datetime import datetime
channel_gen_clean['ts'] = pd.to_datetime(channel_gen['ts'], unit='s')

# remove miliseconds 
channel_gen_clean['ts'] = channel_gen_clean['ts'].astype('datetime64[s]')

channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['ts'] = pd.to_datetime(channel_gen['ts'], unit='s')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['ts'] = channel_gen_clean['ts'].astype('datetime64[s]')


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou..."
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou..."
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou..."
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S..."


In [913]:
# create a column for the days of the week using the ts column

channel_gen_clean['day_number'] = channel_gen_clean['ts'].dt.dayofweek   
channel_gen_clean

KeyError: 'ts'

In [897]:
# create a column for the months of the year using the ts column
channel_gen_clean['month'] = pd.DatetimeIndex(channel_gen_clean['ts']).month

# convert values to date time and then month names

channel_gen_clean['month'] = pd.to_datetime(channel_gen_clean['month'], format='%m').dt.month_name()
channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['month'] = pd.DatetimeIndex(channel_gen_clean['ts']).month
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['month'] = pd.to_datetime(channel_gen_clean['month'], format='%m').dt.month_name()


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions,weekday_number,month
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,,6,May
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,,6,May
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April


In [898]:
# create a column for the type of the weekday using the ts column
channel_gen_clean['day_type'] = channel_gen_clean.ts.dt.weekday.apply(
    lambda x: 'Weekday' if x < 5 else 'Weekend')
channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['day_type'] = channel_gen_clean.ts.dt.weekday.apply(


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions,weekday_number,month,day_type
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,,6,May,Weekend
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,,6,May,Weekend
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April,Weekday
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April,Weekday
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April,Weekday
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April,Weekday


In [899]:
# create a column for the hour of the day using the ts column
channel_gen_clean['time']= channel_gen_clean['ts'].dt.strftime('%H:%M')
channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['time']= channel_gen_clean['ts'].dt.strftime('%H:%M')


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions,weekday_number,month,day_type,time
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:00
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:01
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,,6,May,Weekend,15:27
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:30
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:32
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April,Weekday,16:16
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April,Weekday,06:45
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April,Weekday,06:48
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April,Weekday,06:48


In [908]:
# drop ts column
channel_gen_clean.drop('ts', axis=1, inplace=True) 

#test
channel_gen_clean.head()

KeyError: "['ts'] not found in axis"

- user_profile: extract real_name in new column, remove the original
- attachments: extract title, text, link in new columns
- files: extract url_private and who shared
- reactions: extract user, count, name of the emoji
- text: ?


In [902]:
# check dictionary values in files column
channel_gen[~channel_gen['files'].isna()]['files'].iloc[30][0]

{'id': 'F01T66CCE7M',
 'created': 1617376819,
 'timestamp': 1617376819,
 'name': 'm.jpg',
 'title': 'm.jpg',
 'mimetype': 'image/jpeg',
 'filetype': 'jpg',
 'pretty_type': 'JPEG',
 'user': 'U01S0P26NKD',
 'editable': False,
 'size': 407181,
 'mode': 'hosted',
 'is_external': False,
 'external_type': '',
 'is_public': True,
 'public_url_shared': False,
 'display_as_bot': False,
 'username': '',
 'url_private': 'https://files.slack.com/files-pri/T01RBRV5F7H-F01T66CCE7M/m.jpg?t=xoxe-1861879185255-2047593969042-2053756158948-4cf6552631c579c36efd5d2fd242de30',
 'url_private_download': 'https://files.slack.com/files-pri/T01RBRV5F7H-F01T66CCE7M/download/m.jpg?t=xoxe-1861879185255-2047593969042-2053756158948-4cf6552631c579c36efd5d2fd242de30',
 'thumb_64': 'https://files.slack.com/files-tmb/T01RBRV5F7H-F01T66CCE7M-a66afdce77/m_64.jpg?t=xoxe-1861879185255-2047593969042-2053756158948-4cf6552631c579c36efd5d2fd242de30',
 'thumb_80': 'https://files.slack.com/files-tmb/T01RBRV5F7H-F01T66CCE7M-a66afdc

In [900]:
#files column: extract url_private

def geturlfromfile(x):
    """this function is applied to column files
    """
    
    if x != x:
        return 'nouser'
    else:
        return x[0]['user']

channel_gen_clean['who_shared_files'] = channel_gen_clean['files'].apply(geturlfromfile)

channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['who_shared_files'] = channel_gen_clean['files'].apply(geturlfromfile)


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions,weekday_number,month,day_type,time,who_shared_files
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:00,nouser
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:01,nouser
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,,6,May,Weekend,15:27,nouser
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:30,nouser
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:32,nouser
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April,Weekday,16:16,nouser
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April,Weekday,06:45,nouser
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April,Weekday,06:48,nouser
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April,Weekday,06:48,nouser


In [901]:
#files column: extract who shared

def geturlfromfile(x):
    """this function is applied to column files
    """
    
    if x != x:
        return 'nofile'
    else:
        return x[0]['url_private']

channel_gen_clean['link_of_file'] = channel_gen_clean['files'].apply(geturlfromfile)

channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['link_of_file'] = channel_gen_clean['files'].apply(geturlfromfile)


Unnamed: 0,client_msg_id,text,user,ts,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,files,reactions,weekday_number,month,day_type,time,who_shared_files,link_of_file
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,2021-05-09 08:00:26,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:00,nouser,nofile
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,2021-05-09 08:01:01,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,08:01,nouser,nofile
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,2021-05-09 15:27:59,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,,6,May,Weekend,15:27,nouser,nofile
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,2021-05-09 15:30:41,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:30,nouser,nofile
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,2021-05-09 15:32:00,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,,6,May,Weekend,15:32,nouser,nofile
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,2021-04-27 16:16:45,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April,Weekday,16:16,nouser,nofile
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,2021-04-28 06:45:19,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April,Weekday,06:45,nouser,nofile
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,2021-04-28 06:48:05,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April,Weekday,06:48,nouser,nofile
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,2021-04-28 06:48:51,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April,Weekday,06:48,nouser,nofile


In [909]:
# drop files column
channel_gen_clean.drop('files', axis=1, inplace=True) 

#test
channel_gen_clean.head()

Unnamed: 0,client_msg_id,text,user,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,reactions,weekday_number,month,day_type,time,who_shared_files,link_of_file
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,08:00,nouser,nofile
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,08:01,nouser,nofile
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,6,May,Weekend,15:27,nouser,nofile
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,15:30,nouser,nofile
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,6,May,Weekend,15:32,nouser,nofile


In [903]:
channel_gen[~channel_gen['attachments'].isna()]['attachments'].iloc[30][0]['app_unfurl_url']

'https://github.com/Caparisun/Linear_Regression_Project'

In [904]:
# attachments column: extract link

def geturlfromattachments(x):
    """this function is applied to column attachments
    """
    
    if x != x:
        return 'nolink'
    else:
        return x[0]['app_unfurl_url']

channel_gen_clean['link_of_attachments'] = channel_gen_clean['attachments'].apply(geturlfromattachments)

channel_gen_clean

KeyError: 'app_unfurl_url'

In [910]:
channel_gen[~channel_gen['user_profile'].isna()]['user_profile'].iloc[0]['real_name']

'Karina Condeixa'

In [911]:
# user_profile column: extract real_name

def getrealnamefromprofile(x):
    """this function is applied to column user_profile
    """
    
    if x != x:
        return 'noname'
    else:
        return x['real_name']

channel_gen_clean['real_name'] = channel_gen_clean['user_profile'].apply(getrealnamefromprofile)

channel_gen_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  channel_gen_clean['real_name'] = channel_gen_clean['user_profile'].apply(getrealnamefromprofile)


Unnamed: 0,client_msg_id,text,user,user_profile,reply_count,reply_users_count,reply_users,replies,attachments,reactions,weekday_number,month,day_type,time,who_shared_files,link_of_file,real_name
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,08:00,nouser,nofile,Karina Condeixa
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,08:01,nouser,nofile,Karina Condeixa
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,6,May,Weekend,15:27,nouser,nofile,Francisco Ebeling
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,"{'avatar_hash': '606def897de6', 'image_72': 'h...",,,,,,,6,May,Weekend,15:30,nouser,nofile,Karina Condeixa
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,"{'avatar_hash': '339c4d69b7d9', 'image_72': 'h...",,,,,,,6,May,Weekend,15:32,nouser,nofile,Francisco Ebeling
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1141,b78965a7-bbb5-4095-977d-0277d523be45,absolutely!,U01RW140HBP,"{'avatar_hash': '4c43a3a8d10b', 'image_72': 'h...",,,,,,"[{'name': '+1', 'users': ['U01SK96QF5E'], 'cou...",1,April,Weekday,16:16,nouser,nofile,siand the LT (she/her)
1142,c0cb9b40-89a9-4fc8-a2d5-6d7f5fbc7a3c,Morning all. - anyone completed the RFM lab fr...,U01S133DZ9A,"{'avatar_hash': 'adb8b81e55b5', 'image_72': 'h...",4.0,3.0,"[U01S7BM4N81, U01RW140HBP, U01S133DZ9A]","[{'user': 'U01S7BM4N81', 'ts': '1619592485.075...",,"[{'name': '+1', 'users': ['U01S7KCL3DF'], 'cou...",2,April,Weekday,06:45,nouser,nofile,Simon Data
1143,58ea4cb8-c316-44e5-bf8b-e02b79dd69b4,"Morning simon, mine is here:\n<https://public....",U01S7BM4N81,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,"[{'name': '+1', 'users': ['U01S133DZ9A'], 'cou...",2,April,Weekday,06:48,nouser,nofile,Thamo
1144,6a631562-e138-4d10-89c9-7b54adbe7f31,Bur if I recall correctly there’s screenshots ...,U01S7BM4N81,"{'avatar_hash': '2adfc641e87e', 'image_72': 'h...",,,,,,"[{'name': '+1', 'users': ['U01RW140HBP', 'U01S...",2,April,Weekday,06:48,nouser,nofile,Thamo


In [912]:
# drop user_profile column
channel_gen_clean.drop('user_profile', axis=1, inplace=True) 

#test
channel_gen_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,client_msg_id,text,user,reply_count,reply_users_count,reply_users,replies,attachments,reactions,weekday_number,month,day_type,time,who_shared_files,link_of_file,real_name
0,2d1b171f-ab27-4a43-a0b7-0ecabcfc3294,Hang told me to add it in education,U01S79YDELR,,,,,,,6,May,Weekend,08:00,nouser,nofile,Karina Condeixa
1,a3c6083b-3a42-43a0-a621-bd32e26972d2,What improved my score was adding metrics of a...,U01S79YDELR,,,,,,,6,May,Weekend,08:01,nouser,nofile,Karina Condeixa
2,ec4aa0c0-53f5-4e94-9080-2e98ec280b3d,I feel like a slave to this dumb Resume Worded...,U01RRV4JX6Z,31.0,2.0,"[U01S79YDELR, U01RRV4JX6Z]","[{'user': 'U01S79YDELR', 'ts': '1620574241.002...",,,6,May,Weekend,15:27,nouser,nofile,Francisco Ebeling
3,0aac10ae-ccff-47cf-bb2a-69a34819375b,"Francisco, we have to remove the fancy/beautif...",U01S79YDELR,,,,,,,6,May,Weekend,15:30,nouser,nofile,Karina Condeixa
4,39e4728b-90f7-4d84-b149-74d765ca544a,"Ah, ok!",U01RRV4JX6Z,,,,,,,6,May,Weekend,15:32,nouser,nofile,Francisco Ebeling


In [None]:
# filter out for the rows which has subtype values
channel_topic_changes = channel_gen.loc[channel_gen['text'].str.contains("channel topic", case=False)]
channel_topic_changes

### channel_books

In [None]:
### books channel
# defining file path
path_to_json = '../data/books/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_books = pd.concat(dfs, ignore_index=True)
# test
channel_books

In [None]:
# save to csv
channel_books.to_csv(r'../data/books/channel_books.csv', index = False)

### channel_dresource

In [None]:
### data_resources channel
# defining file path
path_to_json = '../data/data_resources/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_dresource = pd.concat(dfs, ignore_index=True)
# test
channel_dresource

In [None]:
# save to csv
channel_dresource.to_csv(r'../data/data_resources/channel_dresource.csv', index = False)

### channel_dbootcamp

In [None]:
### data-bootcamp channel
# defining file path
path_to_json = '../data/data-bootcamp/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_dbootcamp = pd.concat(dfs, ignore_index=True)
# test
channel_dbootcamp


In [None]:
# save to csv
channel_dbootcamp.to_csv(r'../data/data-bootcamp/channel_dbootcamp.csv', index = False)

### channel_dmemes

In [None]:
### data-memes channel
# defining file path
path_to_json = '../data/data-memes/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_dmemes = pd.concat(dfs, ignore_index=True)
# test
channel_dmemes

In [None]:
# save to csv
channel_dmemes.to_csv(r'../data/data-memes/channel_dmemes.csv', index = False)

### channel_dvizbeauties|

In [None]:
### data-viz-beauties channel
# defining file path
path_to_json = '../data/data-viz-beauties/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_dvizbeauties = pd.concat(dfs, ignore_index=True)
# test
channel_dvizbeauties

In [None]:
# save to csv
channel_dvizbeauties.to_csv(r'../data/data-viz-beauties/channel_dvizbeauties.csv', index = False)

### channel_finalproject

In [None]:
### final-project channel
# defining file path
path_to_json = '../data/final-project/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_finalproject = pd.concat(dfs, ignore_index=True)
# test
channel_finalproject.head(10)

In [None]:
# save to csv
channel_finalproject.to_csv(r'../data/final-project/finalproject.csv', index = False)

### channel_frustrations

In [None]:
### frustrations-shared channel
# defining file path
path_to_json = '../data/frustrations-shared/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_frustrations = pd.concat(dfs, ignore_index=True)
# test
channel_frustrations

In [None]:
# save to csv
channel_frustrations.to_csv(r'../data/frustrations-shared/frustrations.csv', index = False)

### channel_funcommittee

In [None]:
### fun_committee channel
# defining file path
path_to_json = '../data/fun_committee/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_funcommittee = pd.concat(dfs, ignore_index=True)
# test
channel_funcommittee

In [None]:
# save to csv
channel_funcommittee.to_csv(r'../data/fun_committee/channel_funcommittee.csv', index = False)

### channel_katas

In [None]:
### katas channel
# defining file path
path_to_json = '../data/katas/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_katas = pd.concat(dfs, ignore_index=True)
# test
channel_katas

In [None]:
# save to csv
channel_katas.to_csv(r'../data/katas/channel_katas.csv', index = False)

### channel_labhelp

In [None]:
### lab-help channel
# defining file path
path_to_json = '../data/lab-help/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_labhelp = pd.concat(dfs, ignore_index=True)
# test
channel_labhelp

In [None]:
# save to csv
channel_labhelp.to_csv(r'../data/lab-help/channel_labhelp.csv', index = False)

### channel_music

In [None]:
### music channel
# defining file path
path_to_json = '../data/music/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_music = pd.concat(dfs, ignore_index=True)
# test
channel_music

In [None]:
# save to csv
channel_music.to_csv(r'../data/music/channel_music.csv', index = False)

### channel_random

In [None]:
### random channel
# defining file path
path_to_json = '../data/random/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_random = pd.concat(dfs, ignore_index=True)
# test
channel_random

In [None]:
# save to csv
channel_random.to_csv(r'../data/random/channel_random.csv', index = False)

### channel_vanilla

In [None]:
### vanilla_plus_more channel
# defining file path
path_to_json = '../data/vanilla_plus_more/' 

# get all json files from there
json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

# an empty list to store the data frames
dfs = []
for file in file_list:
    # read data frame from json file
    data = pd.read_json(file)
    # append the data frame to the list
    dfs.append(data)

# concatenate all the data frames in the list
channel_vanilla = pd.concat(dfs, ignore_index=True)
# test
channel_vanilla

In [None]:
# save to csv
channel_vanilla.to_csv(r'../data/vanilla_plus_more/channel_vanilla.csv', index = False)