# Twitter Sentiment Analysis

Scripping Twitter menggunakan library **SNScrape** [dokumentasi](https://github.com/JustAnotherArchivist/snscrape)

## 1. Initiation
Pada tahap inisiasi, akan dilakukan install library SNScrape, import library-library yang dibutuhkan, *mount* script ini ke drive, dan konfigurasi pandas

Install library SNScrape dapat dilakukan dengan script berikut:

In [1]:
!pip install snscrape
!pip install daterangeparser

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting snscrape
  Downloading snscrape-0.6.2.20230320-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.8/71.8 KB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: snscrape
Successfully installed snscrape-0.6.2.20230320
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting daterangeparser
  Downloading DateRangeParser-1.3.2-py3-none-any.whl (23 kB)
Installing collected packages: daterangeparser
Successfully installed daterangeparser-1.3.2


Library yang digunakan adalah sntwitter dari snscrape. pandas dan numpy untuk mempermudah proses wrangling. dan library json, ast, re untuk mempermudah parsing data atau extraksi data

In [2]:
from dateutil import parser
from daterangeparser import parse
from datetime import datetime, timedelta
import snscrape.modules.twitter as sntwitter
import pandas as pd
import numpy as np
import json
import ast
import re
import os

Mount drive dapat dilakukan menggunakan script berikut untuk dapat mempermudah upload dan download file yang diinginkan

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Konfigurasi pandas untuk mempercantik dan bisa menampilkan kolom dan row dengan jumlah lebih banyak

In [4]:
# Pandas config
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [None]:
def parseStartEndDate(datestring):
  datestring = datestring.replace('.', '').strip()
  try:
    date_format = parse(datestring)[0]
  except:
    if datestring == '-':
      date_format = None
    else:
      date_format = datetime.strptime(datestring, "%d-%b-%y")
  return date_format

FILENAME = '/content/drive/MyDrive/Freelance/twitter_sentiment_analysis/data_new/Aksi_Korporasi_IPO_Tahun_2011_2022.csv'
DESTFILE = '/content/drive/MyDrive/Freelance/twitter_sentiment_analysis/data_new/output_extra/'
df_fileipo = pd.read_csv(FILENAME, sep=';')
df_fileipo['tanggal_penawaran_awal'] = df_fileipo['Tanggal Penawaran Awal'].apply(parseStartEndDate)
df_fileipo['tanggal_ipo'] = pd.to_datetime(df_fileipo['Tanggal Pencatatan IPO'], format="%d-%b-%y")
df_fileipo['diff_days'] = (df_fileipo['tanggal_ipo'] - df_fileipo['tanggal_penawaran_awal']) / np.timedelta64(1, 'D')
df_fileipo['28_days_date'] = df_fileipo['tanggal_ipo'] -  pd.to_timedelta(28, unit='d')
df_fileipo['tgl_awal_coalesced'] = df_fileipo['tanggal_penawaran_awal'].combine_first(df_fileipo['28_days_date'])
df_fileipo['tgl_poin_1'] = df_fileipo['tgl_awal_coalesced'] -  pd.to_timedelta(77, unit='d')
df_fileipo['tanggal_ipo_plus_1'] = df_fileipo['tanggal_ipo'] +  pd.to_timedelta(1, unit='d')
df_fileipo['tgl_awal_coalesced_p_1'] = df_fileipo['tgl_awal_coalesced'] +  pd.to_timedelta(1, unit='d')
df_fileipo = df_fileipo[df_fileipo['tanggal_ipo'].dt.year != 2022]

## Fase 1 rescraping

In [None]:
FILENA = '/content/drive/MyDrive/Freelance/twitter_sentiment_analysis/data_new/na_fase_1.csv'
df_na = pd.read_csv(FILENA)
df_merge_ = df_fileipo.merge(df_na, left_on='kode', right_on='kode', how='left')
df_fileipo = df_merge_[df_merge_['Unnamed: 0'].notna()]
len(df_fileipo)

186

## 2. Define Function
Dalam proses get twitter terdiri dari get data dengan kode saham "$kode" dan get data menggunakan 

In [None]:
def get_tweets_union(kode, nama, until):
  attributes_container = []
  # Using TwitterSearchScraper to scrape data and append tweets to list
  query = '{} until:{} lang:in'.format('$'+kode, until)
  enum = enumerate(sntwitter.TwitterSearchScraper(query).get_items())
  for i,tweet in enum:
    if i >= 50000:
      print('break')
      break
    attributes_container.append(json.loads(tweet.json()))
      
  # Creating a dataframe to load the list
  tweets_dfkd = pd.DataFrame(attributes_container)

  attributes_container = []
  # Using TwitterSearchScraper to scrape data and append tweets to list
  query = '{} until:{} lang:in'.format(nama, until)
  enum = enumerate(sntwitter.TwitterSearchScraper(query).get_items())
  for i,tweet in enum:
    if i >= 50000:
      print('break')
      break
    attributes_container.append(json.loads(tweet.json()))
      
  # Creating a dataframe to load the list
  tweets_dfnm = pd.DataFrame(attributes_container)

  attributes_container = []
  # Using TwitterSearchScraper to scrape data and append tweets to list
  query = 'saham {} until:{} lang:in'.format(kode, until)
  enum = enumerate(sntwitter.TwitterSearchScraper(query).get_items())
  for i,tweet in enum:
    if i >= 50000:
      print('break')
      break
    attributes_container.append(json.loads(tweet.json()))
      
  # Creating a dataframe to load the list
  tweets_dfshm = pd.DataFrame(attributes_container)

  df_return = pd.concat([tweets_dfkd, tweets_dfnm, tweets_dfshm])
  df_return['id'] = df_return['id'].astype(str)
  df_return['conversationId'] = df_return['conversationId'].astype(str)
  df_return['inReplyToTweetId'] = df_return['inReplyToTweetId'].fillna(0).astype(np.int64).astype(str)
  return df_return

In [None]:
# # Test get GOTO
# kode = 'GOTO'
# nama = 'GoTo Gojek Tokopedia'
# since = '2021-12-28'
# until = '2022-04-12'
# df_goto = get_tweets_union(kode, nama, until)
# len(df_goto)
# # df_goto['Date Created'] = df_goto['Date Created'].dt.tz_localize(None)
# # .to_excel(DESTFILE+kode+".xlsx")
# # df_goto.to_excel(DESTFILE+kode+".xlsx")

In [None]:
# df_goto = pd.read_excel(DESTFILE+kode+".xlsx")
# df_goto[df_goto.index == 6]['renderedContent'].to_list()
# df_goto[df_goto.index == 6]['renderedContent'].to_list()

# df_goto_unicode = df_goto.applymap(lambda x: x.encode('unicode_escape').decode('utf-8') if isinstance(x, str) else x)
# df_goto_unicode[df_goto_unicode.index == 6]['renderedContent'].to_list()

# df_goto_decode = df_goto.applymap(lambda x: x.encode('utf-16', 'surrogatepass').decode('utf-16') if isinstance(x, str) else x)
# df_goto_decode[df_goto_decode.index == 6]['renderedContent'].to_list()

# list_emitendone = os.listdir(DESTFILE)
# list_emitendone = [li[:-5] for li in list_emitendone]
# # list_emitendone
# list_emiten = df_fileipo[['kode', 'nama', 'tanggal_ipo_plus_1', 'tgl_poin_1']]
# list_emiten = list_emiten[~list_emiten['kode'].isin(list_emitendone)]
# list_emiten.head()

In [None]:
list_emitendone = os.listdir(DESTFILE)
list_emitendone = [li[:-5] for li in list_emitendone]
# list_emitendone
list_emiten = df_fileipo[['kode', 'nama', 'tgl_awal_coalesced_p_1']]
list_emiten = list_emiten[~list_emiten['kode'].isin(list_emitendone)]
len(list_emiten)

0

In [None]:
list_emiten = list_emiten.T.to_dict()
is_first = True
for emiten in list_emiten.values():
  kode = emiten['kode']
  nama = emiten['nama']
  until = emiten['tgl_awal_coalesced_p_1'].strftime("%Y-%m-%d")

  try:
    df_tweets = get_tweets_union(kode, nama, until)
    df_tweets['kode_saham'] = kode
    df_tweets['nama_saham'] = nama
    df_tweets['unitl_date'] = until

    # if is_first:
    #   df_temp = df_tweets
    #   is_first = False
    # else:
    #   df_temp = pd.concat([df_temp, df_tweets])
    
    # print(kode, nama, since, until, len(df_tweets), len(df_temp))
    # df_tweets = df_tweets.applymap(lambda x: x.encode('unicode_escape').decode('utf-8') if isinstance(x, str) else x)
    df_tweets.to_excel(DESTFILE+kode+".xlsx")
  except:
    print(kode)
    df_tweets = pd.read_excel(DESTFILE+"kosong.xlsx", engine=None).drop(columns='Unnamed: 0')
    df_tweets.to_excel(DESTFILE+kode+".xlsx")

BSSR
JGLE
GMFI
TDPM
NUSA
BOLA
PGJO
AYLS
CBMF
break
KMDS
break
FIMP
MGLV


## Re-grouping

In [None]:
list_emitendone = df_fileipo['kode'].tolist()
is_first = True
for emitendone in list_emitendone:
  if is_first:
    df_all = pd.read_excel(DESTFILE+emitendone+".xlsx")
    is_first = False
  else:
    df_temp = pd.read_excel(DESTFILE+emitendone+".xlsx")
    df_all = pd.concat([df_all, df_temp])

df_all['id'] = df_all['id'].astype(str)
df_all['conversationId'] = df_all['conversationId'].astype(str)
df_all['inReplyToTweetId'] = df_all['inReplyToTweetId'].fillna(0).astype(np.int64).astype(str)
df_all.to_excel(DESTFILE+"all.xlsx")

NameError: ignored

In [None]:
df_all = pd.read_excel(DESTFILE+"all.xlsx", engine=None)
print(len(df_all))
df_all.head()

## Parse Nested Columns

In [None]:
user_keys = ['username',
            'id',
            'displayname',
            'rawDescription',
            'renderedDescription',
            'verified',
            'created',
            'followersCount',
            'friendsCount',
            'statusesCount',
            'favouritesCount',
            'listedCount',
            'mediaCount',
            'location',
            'protected',
            'profileImageUrl',
            'profileBannerUrl',
            'url']
for key in user_keys:
  df_all['user_'+key] = df_all['user'].apply(lambda x: ast.literal_eval(x)[key])

In [None]:
quotedTweets = ['date',
                'rawContent',
                'id',
                'lang']
for key in quotedTweets:
  df_all['quotedTweets_'+key] = df_all['quotedTweet'].apply(lambda x: ast.literal_eval(x)[key] if str(x) != 'nan' else 'nan')
df_all['quotedTweets_username'] = df_all['quotedTweet'].apply(lambda x: ast.literal_eval(x)['user']['username'] if str(x) != 'nan' else 'nan')

df_all = df_all.drop(columns=['user', 'quotedTweet'])

SyntaxError: ignored

In [None]:
df_all = df_all.drop(columns=['user', 'quotedTweet'])
df_all = df_all.drop(columns=['Unnamed: 0', 'Unnamed: 0.1', '_type', 'links', 'media', 'retweetedTweet', 'inReplyToUser', 'mentionedUsers', 
                              'coordinates', 'place', 'hashtags', 'cashtags', 'card', 'viewCount', 'vibe'])

In [None]:
df_allclean = df_all.drop_duplicates()
len(df_allclean)

235143

In [None]:
df_allclean.head()

Unnamed: 0,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,content,outlinks,outlinksss,tcooutlinks,tcooutlinksss,username,kode_saham,nama_saham,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url
0,https://twitter.com/peyete/status/152628446521...,2010-12-16T04:32:00+00:00,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...","Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",15262844652101632,0,0,0,0,15262844652101632,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete
1,https://twitter.com/peyete/status/152625905401...,2010-12-16T04:30:59+00:00,Menyusuri lorong cirebon (@ Martina Berto) htt...,Menyusuri lorong cirebon (@ Martina Berto) htt...,15262590540193792,0,0,0,0,15262590540193792,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Menyusuri lorong cirebon (@ Martina Berto) htt...,[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete
2,https://twitter.com/peyete/status/152398920490...,2010-12-16T03:00:48+00:00,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...","Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",15239892049076224,0,0,0,0,15239892049076224,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete
3,https://twitter.com/nurghozan/status/145025929...,2010-12-14T02:11:02+00:00,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...","Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",14502592939827200,0,0,0,0,14502592939827200,in,"<a href=""http://blackberry.com/twitter"" rel=""n...",http://blackberry.com/twitter,Twitter for BlackBerry®,0,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",[],,[],,nurghozan,MBTO,Martina Berto,2010-12-17,nurghozan,107985267,Euis Hadiyono,Hadiyono's daughter | Hasan's wife :),Hadiyono's daughter | Hasan's wife :),False,2010-01-24T12:09:33+00:00,194,145,5760,115,1,94,Indonesia,False,https://pbs.twimg.com/profile_images/567472921...,https://pbs.twimg.com/profile_banners/10798526...,https://twitter.com/nurghozan
4,https://twitter.com/Irma_pulungan/status/14481...,2010-12-14T00:49:05+00:00,Training make up TREND 2011 SARIAYU MARTHA TIL...,Training make up TREND 2011 SARIAYU MARTHA TIL...,14481972520820736,0,0,0,0,14481972520820736,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Training make up TREND 2011 SARIAYU MARTHA TIL...,[],,[],,Irma_pulungan,MBTO,Martina Berto,2010-12-17,Irma_pulungan,138087271,Irmalasari Pulungan,Be your self...simple & happy with my life...,Be your self...simple & happy with my life...,False,2010-04-28T15:51:42+00:00,43,52,489,237,0,11,"ÜT: -6.354193,106.839067",False,https://pbs.twimg.com/profile_images/378800000...,https://pbs.twimg.com/profile_banners/13808727...,https://twitter.com/Irma_pulungan


## Create Sample

In [None]:
df_allclean['until_date'] = pd.to_datetime(df_allclean['unitl_date'], errors='coerce')
df_all_11_21 = df_allclean[df_allclean['until_date'].dt.year != 2022]

len(df_all_11_21)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allclean['until_date'] = pd.to_datetime(df_allclean['unitl_date'], errors='coerce')


235143

In [None]:
df_sample = df_all_11_21.sample(1909)
print(len(df_sample))
df_sample.head()

1909


Unnamed: 0,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,kode_saham,nama_saham,since_date,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url,quotedTweets_date,quotedTweets_rawContent,quotedTweets_id,quotedTweets_lang,quotedTweets_username,until_date
81594,https://twitter.com/bpn_loker/status/405975512...,2013-11-28T08:24:39+00:00,Nakhoda - PT Logindo Samudramakmur (Balikpapan...,Nakhoda - PT Logindo Samudramakmur (Balikpapan...,405975512196976640,0,0,0,0,405975512196976640,in,"<a href=""https://dlvrit.com/"" rel=""nofollow"">d...",https://dlvrit.com/,dlvr.it,0,LEAD,Logindo Samudramakmur,2013-08-28,2013-12-12,bpn_loker,2152511527,Lowongan Kerja Balikpapan,𝙻𝚘𝚠𝚘𝚗𝚐𝚊𝚗 𝙺𝚎𝚛𝚓𝚊 𝙶𝚛𝚊𝚝𝚒𝚜 | 𝙳𝙼 𝚕𝚊𝚗𝚐𝚜𝚞𝚗𝚐 |\n𝙼𝚎𝚍𝚒𝚊 𝚂...,𝙻𝚘𝚠𝚘𝚗𝚐𝚊𝚗 𝙺𝚎𝚛𝚓𝚊 𝙶𝚛𝚊𝚝𝚒𝚜 | 𝙳𝙼 𝚕𝚊𝚗𝚐𝚜𝚞𝚗𝚐 |\n𝙼𝚎𝚍𝚒𝚊 𝚂...,False,2013-10-24T08:12:50+00:00,5625,13,27771,0,3,8422,,False,https://pbs.twimg.com/profile_images/143265163...,https://pbs.twimg.com/profile_banners/21525115...,https://twitter.com/bpn_loker,,,,,,2013-12-12
43891,https://twitter.com/rdiy_/status/2225336868081...,2012-07-10T03:32:20+00:00,Promo BB Curve 3G Seharga 1.9 jt di Global Tel...,Promo BB Curve 3G Seharga 1.9 jt di Global Tel...,222533686808166401,0,0,0,0,222533686808166401,in,"<a href=""http://twitterfeed.com"" rel=""nofollow...",http://twitterfeed.com,twitterfeed,0,GLOB,Global Teleshop,2012-03-27,2012-07-11,rdiy_,45073563,Ardi Y ,20s – realistic – minimalism,20s – realistic – minimalism,False,2009-06-06T05:12:19+00:00,4978,303,129043,1652,13,1265,Jakarta Capital Region,False,https://pbs.twimg.com/profile_images/149920376...,https://pbs.twimg.com/profile_banners/45073563...,https://twitter.com/rdiy_,,,,,,2012-07-11
40314,https://twitter.com/MFS8008/status/22219803984...,2012-07-09T05:18:35+00:00,MNC Sky Vision Tbk Akan Segera Lunasi Utang ht...,MNC Sky Vision Tbk Akan Segera Lunasi Utang bi...,222198039840956416,0,0,0,0,222198039840956416,in,"<a href=""https://dlvrit.com/"" rel=""nofollow"">d...",https://dlvrit.com/,dlvr.it,0,MSKY,MNC Sky Vision,2012-03-30,2012-07-10,MFS8008,475232549,MFS8008.com,"Menerima pesanan Kaos Polos, Kaos Sablon, Kaos...","Menerima pesanan Kaos Polos, Kaos Sablon, Kaos...",False,2012-01-26T20:48:26+00:00,118,370,125786,0,1,23,"Bandung, JB, ID",False,https://pbs.twimg.com/profile_images/378800000...,https://pbs.twimg.com/profile_banners/47523254...,https://twitter.com/MFS8008,,,,,,2012-07-10
41981,https://twitter.com/INFO_ASIK/status/200129270...,2012-05-09T07:45:10+00:00,#infoasik MNC Securities &amp; Danareksa Jadi ...,#infoasik MNC Securities &amp; Danareksa Jadi ...,200129270054535169,0,0,0,0,200129270054535169,in,"<a href=""http://twitterfeed.com"" rel=""nofollow...",http://twitterfeed.com,twitterfeed,0,MSKY,MNC Sky Vision,2012-03-30,2012-07-10,INFO_ASIK,324382373,INFO ASIK,follow dan temukan disini info-info asik yang ...,follow dan temukan disini info-info asik yang ...,False,2011-06-26T14:26:51+00:00,605,1423,42418,0,1,2,Indonesia,False,https://pbs.twimg.com/profile_images/152770659...,,https://twitter.com/INFO_ASIK,,,,,,2012-07-10
102708,https://twitter.com/fakealiceu/status/47839977...,2014-06-16T04:52:49+00:00,Magna Finance berencana lepas 70% saham baru h...,Magna Finance berencana lepas 70% saham baru d...,478399775272751105,0,0,0,0,478399775272751105,in,"<a href=""https://dlvrit.com/"" rel=""nofollow"">d...",https://dlvrit.com/,dlvr.it,0,MGNA,Magna Finance,2014-03-24,2014-07-08,fakealiceu,1169031146,Aliceu!,[1/6].HVenusUnited! #EXOVENUS,[1/6].HVenusUnited! #EXOVENUS,False,2013-02-11T14:16:58+00:00,797,2,676614,0,58,6237,Fakefams,False,https://abs.twimg.com/sticky/default_profile_i...,,https://twitter.com/fakealiceu,,,,,,2014-07-08


In [None]:
df_sample.to_excel(DESTFILE+"sample.xlsx")

## Cleansing

In [None]:
# df_fileipo = pd.read_csv(FILENAME)
# df_tweets = pd.read_excel(DESTFILE+"gab_plus_offside.xlsx")
df_fileipo[df_fileipo['Kode'] == 'TRUE']

Unnamed: 0.1,No.,Kode,Nama Emiten,Sektor_x,Tanggal Pencatatan IPO,Tanggal Penawaran Awal,Harga Penawaran Awal,Harga IPO,Masa Penawaran Umum,nama,kode,tanggal_penawaran_awal,tanggal_ipo_x,diff_days,28_days_date,tgl_awal_coalesced,tgl_poin_1,tanggal_ipo_plus_1,tgl_awal_coalesced_p_1,Unnamed: 0,Nama.Emiten,Sektor_y,tanggal_ipo_y,harga_ipo


In [None]:
df_all_11_21['kode_saham'] = df_all_11_21['kode_saham'].apply(lambda x: 'TRUE' if x == True else x)

df_join = df_all_11_21.merge(df_fileipo, left_on='kode_saham', right_on='kode', how='left')
df_join['rawContent'] = df_join['rawContent'].apply(lambda x: re.sub('[\r\n]', ' ', str(x)))
df_join['renderedContent'] = df_join['renderedContent'].apply(lambda x: re.sub('[\r\n]', ' ', str(x)))
df_join['user_rawDescription'] = df_join['user_rawDescription'].apply(lambda x: re.sub('[\r\n]', ' ', str(x)))
df_join['user_renderedDescription'] = df_join['user_renderedDescription'].apply(lambda x: re.sub('[\r\n]', ' ', str(x)))
df_join.head()

Unnamed: 0.1,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,content,outlinks,outlinksss,tcooutlinks,tcooutlinksss,username,kode_saham,nama_saham,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url,until_date,No.,Kode,Nama Emiten,Sektor_x,Tanggal Pencatatan IPO,Tanggal Penawaran Awal,Harga Penawaran Awal,Harga IPO,Masa Penawaran Umum,nama,kode,tanggal_penawaran_awal,tanggal_ipo_x,diff_days,28_days_date,tgl_awal_coalesced,tgl_poin_1,tanggal_ipo_plus_1,tgl_awal_coalesced_p_1,Unnamed: 0,Nama.Emiten,Sektor_y,tanggal_ipo_y,harga_ipo
0,https://twitter.com/peyete/status/152628446521...,2010-12-16T04:32:00+00:00,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...","Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",15262844652101632,0,0,0,0,15262844652101632,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0
1,https://twitter.com/peyete/status/152625905401...,2010-12-16T04:30:59+00:00,Menyusuri lorong cirebon (@ Martina Berto) htt...,Menyusuri lorong cirebon (@ Martina Berto) htt...,15262590540193792,0,0,0,0,15262590540193792,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Menyusuri lorong cirebon (@ Martina Berto) htt...,[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0
2,https://twitter.com/peyete/status/152398920490...,2010-12-16T03:00:48+00:00,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...","Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",15239892049076224,0,0,0,0,15239892049076224,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0
3,https://twitter.com/nurghozan/status/145025929...,2010-12-14T02:11:02+00:00,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...","Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",14502592939827200,0,0,0,0,14502592939827200,in,"<a href=""http://blackberry.com/twitter"" rel=""n...",http://blackberry.com/twitter,Twitter for BlackBerry®,0,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",[],,[],,nurghozan,MBTO,Martina Berto,2010-12-17,nurghozan,107985267,Euis Hadiyono,Hadiyono's daughter | Hasan's wife :),Hadiyono's daughter | Hasan's wife :),False,2010-01-24T12:09:33+00:00,194,145,5760,115,1,94,Indonesia,False,https://pbs.twimg.com/profile_images/567472921...,https://pbs.twimg.com/profile_banners/10798526...,https://twitter.com/nurghozan,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0
4,https://twitter.com/Irma_pulungan/status/14481...,2010-12-14T00:49:05+00:00,Training make up TREND 2011 SARIAYU MARTHA TIL...,Training make up TREND 2011 SARIAYU MARTHA TIL...,14481972520820736,0,0,0,0,14481972520820736,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Training make up TREND 2011 SARIAYU MARTHA TIL...,[],,[],,Irma_pulungan,MBTO,Martina Berto,2010-12-17,Irma_pulungan,138087271,Irmalasari Pulungan,Be your self...simple & happy with my life...,Be your self...simple & happy with my life...,False,2010-04-28T15:51:42+00:00,43,52,489,237,0,11,"ÜT: -6.354193,106.839067",False,https://pbs.twimg.com/profile_images/378800000...,https://pbs.twimg.com/profile_banners/13808727...,https://twitter.com/Irma_pulungan,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0


In [None]:
df_join['date'] = df_join['date'].apply(parser.parse)
df_join['date_differ'] = (df_join['tgl_awal_coalesced'].dt.date - df_join['date'].dt.date) / np.timedelta64(1, 'D')
df_join['date_differ_preipo'] = (df_join['tgl_poin_1'].dt.date - df_join['date'].dt.date) / np.timedelta64(1, 'D')
df_join['fase'] = df_join['date_differ'].apply(lambda x: 'fase 1' if x > 0 else 'fase 2')
df_join['is_in_preipofase'] = df_join['date_differ_preipo'].apply(lambda x: 1 if x > 0 else 0)
df_join.head()

Unnamed: 0.1,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,content,outlinks,outlinksss,tcooutlinks,tcooutlinksss,username,kode_saham,nama_saham,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url,until_date,No.,Kode,Nama Emiten,Sektor_x,Tanggal Pencatatan IPO,Tanggal Penawaran Awal,Harga Penawaran Awal,Harga IPO,Masa Penawaran Umum,nama,kode,tanggal_penawaran_awal,tanggal_ipo_x,diff_days,28_days_date,tgl_awal_coalesced,tgl_poin_1,tanggal_ipo_plus_1,tgl_awal_coalesced_p_1,Unnamed: 0,Nama.Emiten,Sektor_y,tanggal_ipo_y,harga_ipo,date_differ,date_differ_preipo,fase,is_in_preipofase
0,https://twitter.com/peyete/status/152628446521...,2010-12-16 04:32:00+00:00,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...","Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",15262844652101632,0,0,0,0,15262844652101632,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Ada museumnya ;p (@ PT Martina Berto, Kawasan ...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0,0.0,-77.0,fase 2,0
1,https://twitter.com/peyete/status/152625905401...,2010-12-16 04:30:59+00:00,Menyusuri lorong cirebon (@ Martina Berto) htt...,Menyusuri lorong cirebon (@ Martina Berto) htt...,15262590540193792,0,0,0,0,15262590540193792,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Menyusuri lorong cirebon (@ Martina Berto) htt...,[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0,0.0,-77.0,fase 2,0
2,https://twitter.com/peyete/status/152398920490...,2010-12-16 03:00:48+00:00,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...","Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",15239892049076224,0,0,0,0,15239892049076224,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,"Gedungnya harum ;p (@ PT Martina Berto, Kawasa...",[],,[],,peyete,MBTO,Martina Berto,2010-12-17,peyete,143358288,Purba Yudha Tama,"||Who belive that reading is fundamental!, @ra...","||Who belive that reading is fundamental!, @ra...",False,2010-05-13T07:34:20+00:00,213,566,6757,80,1,266,"ÜT: -37.7932977,144.9583474",False,https://pbs.twimg.com/profile_images/160180530...,https://pbs.twimg.com/profile_banners/14335828...,https://twitter.com/peyete,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0,0.0,-77.0,fase 2,0
3,https://twitter.com/nurghozan/status/145025929...,2010-12-14 02:11:02+00:00,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...","Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",14502592939827200,0,0,0,0,14502592939827200,in,"<a href=""http://blackberry.com/twitter"" rel=""n...",http://blackberry.com/twitter,Twitter for BlackBerry®,0,"Ajarin eike ya boow..semangat!☺""@irmafauzan: T...",[],,[],,nurghozan,MBTO,Martina Berto,2010-12-17,nurghozan,107985267,Euis Hadiyono,Hadiyono's daughter | Hasan's wife :),Hadiyono's daughter | Hasan's wife :),False,2010-01-24T12:09:33+00:00,194,145,5760,115,1,94,Indonesia,False,https://pbs.twimg.com/profile_images/567472921...,https://pbs.twimg.com/profile_banners/10798526...,https://twitter.com/nurghozan,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0,2.0,-75.0,fase 1,0
4,https://twitter.com/Irma_pulungan/status/14481...,2010-12-14 00:49:05+00:00,Training make up TREND 2011 SARIAYU MARTHA TIL...,Training make up TREND 2011 SARIAYU MARTHA TIL...,14481972520820736,0,0,0,0,14481972520820736,in,"<a href=""http://foursquare.com"" rel=""nofollow""...",http://foursquare.com,Foursquare,0,Training make up TREND 2011 SARIAYU MARTHA TIL...,[],,[],,Irma_pulungan,MBTO,Martina Berto,2010-12-17,Irma_pulungan,138087271,Irmalasari Pulungan,Be your self...simple & happy with my life...,Be your self...simple & happy with my life...,False,2010-04-28T15:51:42+00:00,43,52,489,237,0,11,"ÜT: -6.354193,106.839067",False,https://pbs.twimg.com/profile_images/378800000...,https://pbs.twimg.com/profile_banners/13808727...,https://twitter.com/Irma_pulungan,2010-12-17,2,MBTO,Martina Berto Tbk.,Consumer Non-Cyclicals,13-Jan-11,-,650 - 850,740,3 - 7 Januari 2011,Martina Berto,MBTO,NaT,2011-01-13,,2010-12-16,2010-12-16,2010-09-30,2011-01-14,2010-12-17,107.0,Martina Berto Tbk.,Consumer Non-Cyclicals,2011-01-13,740.0,2.0,-75.0,fase 1,0


In [None]:
df_test = df_join.groupby(['kode_saham', 'fase','is_in_preipofase']).count().reset_index()
df_test[df_test['fase'] == 'fase 1'][df_test['is_in_preipofase'] == 1].count()

  df_test[df_test['fase'] == 'fase 1'][df_test['is_in_preipofase'] == 1].count()


kode_saham                  173
fase                        173
is_in_preipofase            173
url                         173
date                        173
rawContent                  173
renderedContent             173
id                          173
replyCount                  173
retweetCount                173
likeCount                   173
quoteCount                  173
conversationId              173
lang                        173
source                      173
sourceUrl                   173
sourceLabel                 173
inReplyToTweetId            173
content                     173
outlinks                    173
outlinksss                  173
tcooutlinks                 173
tcooutlinksss               173
username                    173
nama_saham                  173
unitl_date                  173
user_username               173
user_id                     173
user_displayname            173
user_rawDescription         173
user_renderedDescription    173
user_ver

In [None]:
df_join.groupby('fase').count()

Unnamed: 0_level_0,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,kode_saham,nama_saham,since_date,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url,quotedTweets_date,quotedTweets_rawContent,quotedTweets_id,quotedTweets_lang,quotedTweets_username,until_date,No.,Kode,Nama Emiten,Sektor,Tanggal Pencatatan IPO,Tanggal Penawaran Awal,Harga Penawaran Awal,Harga IPO,Masa Penawaran Umum,nama,kode,tanggal_penawaran_awal,tanggal_ipo,diff_days,28_days_date,tgl_awal_coalesced,tgl_poin_1,tanggal_ipo_plus_1,date_differ
fase,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1
fase 1,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,19127,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452,24452
fase 2,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,116076,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,166483,65682,166483,65682,166483,166483,166483,166483,65682


In [None]:
df_join.to_csv(DESTFILE+"df_all_clean_join.csv")

In [None]:
df_join = pd.read_csv(DESTFILE+"df_all_clean_join.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
len(df_join['kode_saham'].drop_duplicates())

389

In [None]:
df_join.head()

Unnamed: 0,url,date,rawContent,renderedContent,id,replyCount,retweetCount,likeCount,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,inReplyToTweetId,kode_saham,nama_saham,since_date,unitl_date,user_username,user_id,user_displayname,user_rawDescription,user_renderedDescription,user_verified,user_created,user_followersCount,user_friendsCount,user_statusesCount,user_favouritesCount,user_listedCount,user_mediaCount,user_location,user_protected,user_profileImageUrl,user_profileBannerUrl,user_url,quotedTweets_date,quotedTweets_rawContent,quotedTweets_id,quotedTweets_lang,quotedTweets_username,until_date,No.,Kode,Nama Emiten,Sektor,Tanggal Pencatatan IPO,Tanggal Penawaran Awal,Harga Penawaran Awal,Harga IPO,Masa Penawaran Umum,nama,kode,tanggal_penawaran_awal,tanggal_ipo,diff_days,28_days_date,tgl_awal_coalesced,tgl_poin_1,tanggal_ipo_plus_1,date_differ,fase
0,https://twitter.com/graha11/status/25121785301...,2011-01-12 09:27:55+00:00,Berita Properti: Megapolitan Developments List...,Berita Properti: Megapolitan Developments List...,25121785301180416,0,0,0,0,25121785301180416,in,"<a href=""http://twitterfeed.com"" rel=""nofollow...",http://twitterfeed.com,twitterfeed,0,EMDE,Megapolitan Developments,2010-10-01,2011-01-13,graha11,183261332,Graha Sebelas,http://t.co/op9zvGMWSD hadir untuk memberikan ...,Graha11.com hadir untuk memberikan fasilitas p...,False,2010-08-26T15:29:31+00:00,458,328,27158,0,2,1,Jakarta,False,https://pbs.twimg.com/profile_images/111229721...,,https://twitter.com/graha11,,,,,,2011-01-13,1,EMDE,Megapolitan Developments Tbk.,Properties & Real Estate,12-Jan-11,17 - 22 Dec 2010,150 - 250,250,4 - 6 Januari 2011,Megapolitan Developments,EMDE,2010-12-17,2011-01-12,26.0,2010-12-15,2010-12-17,2010-10-01,2011-01-13,-26.0,fase 2
1,https://twitter.com/fixer_rudy_pt/status/25118...,2011-01-12 09:13:12+00:00,Kompas: Megapolitan Developments Listing di BE...,Kompas: Megapolitan Developments Listing di BE...,25118083190362112,0,0,0,0,25118083190362112,in,"<a href=""https://dlvrit.com/"" rel=""nofollow"">d...",https://dlvrit.com/,dlvr.it,0,EMDE,Megapolitan Developments,2010-10-01,2011-01-13,fixer_rudy_pt,202626595,Rudy@PT,マーケッター兼トレーダー / EC、アフィリエイト、FX、etc... / 広告費0円で半年...,マーケッター兼トレーダー / EC、アフィリエイト、FX、etc... / 広告費0円で半年...,False,2010-10-14T13:30:38+00:00,5066,0,101865,0,48,8,"起業8ヶ月目で月収1,000万円稼いだマーケッターの全て⬇️",False,https://pbs.twimg.com/profile_images/116051397...,https://pbs.twimg.com/profile_banners/20262659...,https://twitter.com/fixer_rudy_pt,,,,,,2011-01-13,1,EMDE,Megapolitan Developments Tbk.,Properties & Real Estate,12-Jan-11,17 - 22 Dec 2010,150 - 250,250,4 - 6 Januari 2011,Megapolitan Developments,EMDE,2010-12-17,2011-01-12,26.0,2010-12-15,2010-12-17,2010-10-01,2011-01-13,-26.0,fase 2
2,https://twitter.com/achaidar/status/2511639805...,2011-01-12 09:06:30+00:00,Sunyalangu News .... Megapolitan Developments ...,Sunyalangu News .... Megapolitan Developments ...,25116398053236736,0,0,0,0,25116398053236736,in,"<a href=""https://www.google.com/"" rel=""nofollo...",https://www.google.com/,Google,0,EMDE,Megapolitan Developments,2010-10-01,2011-01-13,achaidar,227265364,Abu Chaidar,Seneng-seneng Aja,Seneng-seneng Aja,False,2010-12-16T11:03:01+00:00,51,9,215735,0,0,0,,False,https://pbs.twimg.com/profile_images/119242609...,,https://twitter.com/achaidar,,,,,,2011-01-13,1,EMDE,Megapolitan Developments Tbk.,Properties & Real Estate,12-Jan-11,17 - 22 Dec 2010,150 - 250,250,4 - 6 Januari 2011,Megapolitan Developments,EMDE,2010-12-17,2011-01-12,26.0,2010-12-15,2010-12-17,2010-10-01,2011-01-13,-26.0,fase 2
3,https://twitter.com/SM_Sekuritas/status/251141...,2011-01-12 08:57:40+00:00,"Mengawali 2011, PT Megapolitan Developments Tb...","Mengawali 2011, PT Megapolitan Developments Tb...",25114176036802560,0,0,0,0,25114176036802560,in,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",http://twitter.com,Twitter Web Client,0,EMDE,Megapolitan Developments,2010-10-01,2011-01-13,SM_Sekuritas,228649152,Sinarmas Sekuritas,Official Twitter Sinarmas Sekuritas. Download ...,Official Twitter Sinarmas Sekuritas. Download ...,False,2010-12-20T09:03:25+00:00,4384,9,6959,39,50,1188,Plaza BII Tower III Lt.5,False,https://pbs.twimg.com/profile_images/968690257...,https://pbs.twimg.com/profile_banners/22864915...,https://twitter.com/SM_Sekuritas,,,,,,2011-01-13,1,EMDE,Megapolitan Developments Tbk.,Properties & Real Estate,12-Jan-11,17 - 22 Dec 2010,150 - 250,250,4 - 6 Januari 2011,Megapolitan Developments,EMDE,2010-12-17,2011-01-12,26.0,2010-12-15,2010-12-17,2010-10-01,2011-01-13,-26.0,fase 2
4,https://twitter.com/AiLapYuPul/status/25113862...,2011-01-12 08:56:26+00:00,Megapolitan Developments Listing di BEI http:/...,Megapolitan Developments Listing di BEI http:/...,25113862256730112,0,0,0,0,25113862256730112,in,"<a href=""http://twitterfeed.com"" rel=""nofollow...",http://twitterfeed.com,twitterfeed,0,EMDE,Megapolitan Developments,2010-10-01,2011-01-13,AiLapYuPul,62973657,Ai Lap Yu Pul,"Mbah Surip (born Urip Ariyanto, 6 May 1957 - 4...","Mbah Surip (born Urip Ariyanto, 6 May 1957 - 4...",False,2009-08-05T00:12:08+00:00,956,1577,310401,9,4,0,Tak Gendong Kemana-mana,False,https://pbs.twimg.com/profile_images/359890768...,,https://twitter.com/AiLapYuPul,,,,,,2011-01-13,1,EMDE,Megapolitan Developments Tbk.,Properties & Real Estate,12-Jan-11,17 - 22 Dec 2010,150 - 250,250,4 - 6 Januari 2011,Megapolitan Developments,EMDE,2010-12-17,2011-01-12,26.0,2010-12-15,2010-12-17,2010-10-01,2011-01-13,-26.0,fase 2


In [None]:
df_mark[df_mark['id'] == 866321037989879813]['rawContent'].to_list()[0].encode('utf-16', 'surrogatepass').decode('utf-16')

"@_ewokalypse @godhateschloee @GaultKylee @markiplier @iceddarkroast AUHHHH im 4'10 \\U0001f624"

In [None]:
df_text_only = df_join[['renderedContent', 'kode', 'fase']]
df_text_only.to_csv(DESTFILE+"df_text_only.csv")

In [None]:
df_join = pd.read_csv(DESTFILE+"df_all_clean_join.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
df_count = df_join.groupby('Kode').count()['id'].reset_index()

In [None]:
df_count.to_csv(DESTFILE+"count.csv")