# Project 4: Summarizing Reddit Posts | Wallstreetbets
- Scrapped Reddit post data out of the Wallstreetbets subreddit
- Built a dataset out of the extracted information
- Used Hugginface's transformers to summarize post content

In [35]:
#Reading the dataset created with the helper script
import pandas as pd
df = pd.read_csv('WallStreetBets.csv')
df = df.head(30)

In [36]:
#Calling huggingface's summarization transformer
import transformers
from transformers import pipeline
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')

In [37]:
#Defining a function to summarize text
def summarize_post(post):
    result = summarizer(post, truncation=True)
    return result[0]['summary_text']

In [38]:
#Testing the function
summarize_post(df.iloc[0]['posts'])

'IBM, Fortran, and [Harlan Mills] influenced the development of the NASDAQ. SEC knew then what we know now. Banks are assholes. And they figured, well if we get rid of the human element (to some degree) this might make things more legit for more ppl than less of them.'

In [39]:
#TQDM makes progress bars look pretty
from tqdm import tqdm
summarized_posts = []

#Summarize across the entire posts columns
for post in tqdm(df['posts'], desc='Summarizing Posts', maxinterval=len(df['posts'])):
    summarized_posts.append(summarize_post(post))

Summarizing Posts:   7%|▋         | 2/30 [00:20<04:39,  9.99s/it]Your max_length is set to 142, but you input_length is only 71. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=35)
Summarizing Posts:  13%|█▎        | 4/30 [00:40<04:36, 10.63s/it]Your max_length is set to 142, but you input_length is only 134. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=67)
Summarizing Posts:  30%|███       | 9/30 [01:20<03:12,  9.18s/it]Your max_length is set to 142, but you input_length is only 56. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)
Summarizing Posts:  33%|███▎      | 10/30 [01:24<02:37,  7.85s/it]Your max_length is set to 142, but you input_length is only 88. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=44)
Summarizing Posts:  43%|████▎     | 13/30 [01:42<01:51,  6.57s/it]Your max_length is set to 142, but you input_length is only 

In [40]:
#Storing results into the orginal data frame
df['Summarized_Post'] = summarized_posts
df.head()

Unnamed: 0,titles,posts,subreddit,creation_date,authors,upvotes,url,Summarized_Post
0,"Remarks of Richard B. Smith, Commissioner Unit...","""What I want to discuss with you today is a pi...",wallstreetbets,1667245000.0,t2_2a5jbp59,1,https://www.reddit.com/r/wallstreetbets/commen...,"IBM, Fortran, and [Harlan Mills] influenced th..."
1,Brazil,Brazil just finished their run-off election wi...,wallstreetbets,1667244000.0,t2_6n2z4j2y,2,https://www.reddit.com/r/wallstreetbets/commen...,Brazil just finished their run-off election wi...
2,Gamelancer Media Corp. $GMNG $GAMGF has a new ...,Gamelancer Media Corp. has uploaded a new corp...,wallstreetbets,1667243000.0,t2_a1jf7gbk,3,https://www.reddit.com/r/wallstreetbets/commen...,Gamelancer Media Corp. has uploaded a new corp...
3,Trading SPY and FOMC Meeting,"Hey guys, this is another big week for the fut...",wallstreetbets,1667243000.0,t2_5ylhcs6t,8,https://www.reddit.com/r/wallstreetbets/commen...,This is another big week for the future of the...
4,$LLY earnings call,"$LLY earnings, boomer play.\n\nMy 405c FDs for...",wallstreetbets,1667243000.0,t2_gq4rwz6a,0,https://www.reddit.com/r/wallstreetbets/commen...,The reason for my great play is based off of E...


In [41]:
#Reading row 1 full lenght text
df.iloc[1]['posts']

"Brazil just finished their run-off election with Lula the former president beating the incumbent in the closest election in Brazil's democratic victory and the incumbent Jair Bolsonaro has been parroting Trump leading up to the election and has yet to concede. He was a former military leader with the support of the military and in the past has spread the sentiment that he isn't leaving. I wanted exposure to the instability of the country so I bought puts in an ETF there for after the transition of power in the new year. The Brazilian market is generally up today on Lula's victory but I think the instability is not priced in. I chose the ETF EWZ as it has a higher finance exposure which should be the first sector to hurt. I see at least some instability and at most a Coup d'état. Happy Profiting on suffering welcome to capitalism.\n\n&amp;#x200B;\n\nPositions  Jan 20 23   Put. $17 x20, $21x29, $25x11, $29x2"

In [42]:
#Comparing that against its summarized version
df.iloc[1]['Summarized_Post']

"Brazil just finished their run-off election with Lula the former president beating the incumbent in the closest election in Brazil's democratic victory. I wanted exposure to the instability of the country so I bought puts in an ETF there for after the transition of power in the new year. The Brazilian market is generally up today on Lula's victory but I think the instability is not priced in. I see at least some instability and at most a Coup d'état."

In [43]:
#Reading row 5 full lenght text
df.iloc[5]['posts']

"Walter Bloomberg is quitting!\n\nJust a few minutes ago he published this information on his Twitter account, where comments are restricted and great speculation has been created about this news.\n\n [**\\*Walter Bloomberg**](https://twitter.com/DeItaone)[@DeItaone](https://twitter.com/DeItaone)·[1h](https://twitter.com/DeItaone/status/1587132171512340483)Hi, I have done my best to help you over the past 8 years. Unfortunately, today I can't afford to be on social media &amp; to fulfill my personal responsibilities  If you want me back every day, it can't be without your help.  \n\nThanks for your support \n\nWalter \n\n&amp;#x200B;\n\nDoes anyone know Walter Bloomberg's real name to look him up on Linkedin?\n\nI would offer to help him manage his account! What do you think are the real causes, really his information was very good. \n\nThanks Walter.\n\nInfo via Twitter.\n\n&amp;#x200B;\n\nhttps://preview.redd.it/20i185p9v6x91.png?width=900&amp;format=png&amp;auto=webp&amp;s=6513b171a

In [44]:
#Comparing that against its summarized version
df.iloc[5]['Summarized_Post']

'Walter Bloomberg is quitting Twitter. He posted the news on his Twitter account, where comments are restricted and great speculation has been created about this news. He wrote: "I have done my best to help you over the past 8 years. Unfortunately, today I can\'t afford to be on social media"'