# Long Text Sentiment

So far, we have restricted the length of the text being fed into our models. Bert in particular is restricted to consuming 512 tokens per sample. For many use-cases, this is most likely not a problem - but in some cases it can be.

If we take the example of Reddit posts on the */r/investing* subreddit, many of the more important posts are **DD** (due-diligence), which often consists of deep dives into why the author thinks a stock is a good investment or not. On these longer pieces of text, the actual sentiment from the author may not be clear from the first 512 tokens. We need to consider the full post.

Before working through the logic that allows us to consider the full post, let's import and define everything we need to make a prediction on a single chunk of text (using much of what we covered in the last section).

In [1]:
from transformers import BertForSequenceClassification, BertTokenizer
import torch

# initialize our model and tokenizer
tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
model = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')

# and we will place the processing of our input text into a function for easier prediction later
def sentiment(tokens):
    # get output logits from the model
    output = model(**tokens)
    # convert to probabilities
    probs = torch.nn.functional.softmax(output[0], dim=-1)
    # we will return the probability tensor (we will not need argmax until later)
    return probs

Now let's get to how we apply sentiment to longer pieces of text. There are two approaches that we cover in these notebooks:

* Using neural text summarization to shorten the text to below 512 tokens.

* Iterating through the text using a *window* and calculate the average article sentiment.

In this notebook we will be using the second approach. The window in question will be a subsection of our tokenized text, of length `512`. First, let's define an example and tokenize it.

In [15]:
print("""
'You can look at their books, but I believe it’s difficult to make an informed decision about $RBLX without some experience with the website and games themselves. Here are some of my thoughts from being an active member since 2007. Feel free to ask me any questions you might have about the platform.\n\nRoblox has been available to the public since 2006; you’re not investing in a new up &amp; coming company. It’s been around for 16 years, which is very long for an online game. That in itself is a good sign, but its potential and success is pretty cemented as of now. The lack of worldwide reach (with the majority of users being from North America) is worrisome, but there may be room for growth.\n\nThey are currently sitting at **2.4 billion** registered users. The daily active user (DAU) count is somewhere around **30 million** (see S-1 form in my edit).  Thus, the vast majority of registered users are inactive users or bots who prey on dumb kids to steal their account info via cookies. This is more of an anecdote than something you can really back up with hard numbers and data because it’s obviously not something Roblox tracks itself. Every group and comments section on Roblox is plagued by bots spamming fishy links that take you to places for “free Robux” and such. There’s a game on Roblox called New User Machine that tracks the total amount of players and shows the most recently created ones. Basically, it’s a conveyer belt of bots with random letter &amp; number usernames. Their security and captcha, to summarize, sucks.\n\nRoblox itself is a game client, but the website houses a variety of games made by users, of varying quality. There are some real gems but some real garbage as well. The front page cycles the same popular games and not a lot of up-and-coming ones. I can imagine it would be quite difficult to break into the algorithm if you’re not already a popular developer. Some people use/pay for bots to pump up their game’s likes and user count. This is obviously against the site\'s TOS, but it’s difficult to get ahead unless you pour hundreds into on-site ads.\n\nYears ago, they shut down the forum, which contributed to a lot of helpful and insightful on-site discussion. They did so because they believed it was becoming difficult to moderate. Hire more moderators? That on top of the fact that the chat filter (both on-site and in-game) is becoming increasingly overbearing, it’s difficult to hold a conversation on the website, let alone form a tight-knit community. I feel a community is **essential** to the success of a social game platform like this one; right now, it feels like a soulless, barren corporate wasteland, compared to what it was before. Take a look at their [logo change](https://logos-world.net/wp-content/uploads/2020/10/Roblox-Logo-History.jpg), which summarizes the company’s new ethos well. There’s the argument that the chat filter needs to be overbearing because of the possibility of online predators - which is true - but there’s a difference between overbearing and broken. Sometimes, every second word is censored, even completely innocent ones.\n\nTheir customer support is pretty dismal. Not much else to say here, but I’ve personally dealt with it, and unless you are asking something really basic that can be answered by looking at the FAQ, they’re not much help. Just your typical outsourced copy &amp; paste replies. When it comes to accidental moderation resulting in bans or account deletion, or account theft, they’re not very helpful. I’ve had an account made in 2010 deleted in 2016 for something I did not do (account theft; the account itself was stolen) and they were no help.\n\nYears ago as well, the “free” currency Tickets were removed, leaving the premium one, Robux. Tickets were awarded on daily login and could be used to purchase cosmetics and such. Now, join any game and almost every avatar you see is the default one because the vast majority of kids aren’t paying for the premium membership which gives you monthly Robux, nor are they paying for Robux itself.\n\nThus, parent’s wallets are the limiting factor on how much Roblox can grow, because a lot of the older base such as myself shares the same sour sentiment regarding the website as a whole. Even newer users are seen calling for the “glory days” of Roblox to come back, which is odd since they themselves did not experience those “glory days”. That being said, the general community sentiment seems that not many people are happy with the site in its current state, which makes me wonder if it’s current young user base is sustainable or if Roblox depends on cycling generations of kids and isn’t really capable of building an older following in its current state.\n\nMoving on, there’s a thriving black market for Robux and limited cosmetic items because of the aforementioned poor on-site economy. The cosmetics catalog used to have sales for holidays like Black Friday and Memorial Day. These are no longer a thing as of the last few years, which seems like a simple enough thing to do to keep Robux flowing in the economy, but they refuse to do so. (??)\n\nSpeaking of site-wide events, celebrations like a yearly Egg Hunt, events which were dear to many users, are no more. Instead, they are replaced with corporate promotional events. These don’t yield independent games made by Roblox themselves like the Egg Hunt; instead, independent developers are “contracted” to shimmy these elements into their games. Doesn’t make for a very memorable experience, and I don’t know if the microtransactions yielded are any higher than they were for the previous events.\n\nThe search function on the website is pretty broken; for example, if you search the cosmetics catalog for “Adidas hoodie”, that can net you “ADIDAS ADIDAS ADIDAS ADIDAS” named items that may not be a hoodie or relevant to what you’re looking for at all, just something that’s spammed the tag you searched for. This has been the case for years and no attempt to remedy it is evident.\n\nRecently, user-generated content (UGC) has been introduced to the cosmetics catalog, in which approved users can upload hats and other cosmetics rather than just Roblox themselves as before. One of the only good features introduced in the last few years in my opinion, but as a result, Roblox itself has essentially stopped making any cosmetics themselves.\n\nBasically, with the content creation almost exclusively done by independent users and developers, the site now runs itself. As a result, it feels the platform has been stagnating creatively for years. Beyond updating the game client, it’s not really clear what (if anything) Roblox does behind the scenes to promote growth. There’s a lot of very simple improvements and features that could be introduced to satisfy the user base and maintain the appeal to kids and the older audience, but Roblox seems to be reluctant to do so for some reason. For instance, it’s weird that users are begging for on-site sales, which are such a cornerstone of almost any business, or site events, which should be essential as a social game platform. This makes me worry they are forgetting that their user base made them successful in the first place. In short, it doesn’t seem to have the exponential growth it saw from 2007 to the early/mid-2010s, which is worrisome as the platform as a whole is not doing a good job at creating a loyal user base.\n\nAll that being said, this is my perspective of a player. I hope a developer can chime in with their thoughts on the pros and cons of developing on Roblox, what their Robux income looks like, and how that translates into real-world currency.\n\n**EDIT:** This post has blown up way past the point I expected it to. As such, I feel it\'s necessary to address some of the points that myself and others have brought up in this thread. Going forward, this will be more of a financial approach that can tie in with my product/customer analysis.\n\n[Here is Roblox\'s S-1 form.](https://www.sec.gov/Archives/edgar/data/1315098/000119312520298230/d87104ds1.htm) I want to specifically focus on a few sections:\n\n&gt;***We have a history of net losses and we may not be able to achieve or maintain profitability in the future.***  \n&gt;  \n&gt;We have incurred net losses since our inception, and we expect to continue to incur net losses in the near future. We incurred net losses of $97.2\xa0million, $86.0\xa0million, and $203.2\xa0million for the years ended December\xa031, 2018 and 2019, and the nine months ended September\xa030, 2020, respectively. As of September\xa030, 2020, we had an accumulated deficit of $484.0\xa0million. We also expect our operating expenses to increase significantly in future periods, and **if our** **DAU growth does not increase to offset these anticipated increases in our operating expenses, our business, results of operations, and financial condition will be harmed, and we may not be able to achieve or maintain profitability**. We expect our costs and expenses to increase in future periods as we intend to continue to make significant investments to grow our business. These efforts may be more costly than we expect and may not result in increased revenue or growth of our business. In addition to the expected costs to grow our business, we also expect to incur significant additional legal, accounting, and other expenses as a newly public company. If we fail to increase our revenue to sufficiently offset the increases in our operating expenses, we will not be able to achieve or maintain profitability in the future.\n\nRoblox is currently relying upon their Daily Active Users (DAU) to increase past their record all-time high to achieve profitability after a history of net loss. This record was achieved mostly likely due to the current pandemic keeping kids locked inside. They touch on this here:\n\n&gt;***We have experienced rapid growth in recent periods, and our recent growth rates may not be indicative of our future growth or the growth of our market.***  \n&gt;  \n&gt;We have experienced rapid growth in the three months ended June\xa030, 2020, September\xa030, 2020 and for a portion of the three months ended March\xa031, 2020, due in part to the COVID-19 pandemic given our users have been online more as a result of global COVID-19 shelter-in-place policies. For example, our bookings increased 171% from the nine-months ended September\xa030, 2019 to the nine months ended September\xa030, 2020. We do not expect these activity levels to be sustained, and in future periods we expect growth rates for our revenue to decline, and we may not experience any growth in bookings or our user base during periods where we are comparing against COVID-19 impacted periods (i.e. the three months ended March\xa031, 2020, June 30, 2020, and September 30, 2020). Our historical revenue, bookings and user base growth should not be considered indicative of our future performance. We believe our overall acceptance, revenue growth and increases in bookings depend on a number of factors, including, but not limited to, our ability to:  \n&gt;  \n&gt;• **expand the number of developers, creators, and users on our platform;**  \n&gt;  \n&gt;• **provide excellent customer experience and customer support for our developers, creators, and users;**  \n&gt;  \n&gt;**• increase global awareness of our brand.**\n\nThe bolded bullet points here tie in directly with points I have mentioned in my original post, along with this section here:\n\n&gt;***We depend on our developers to create digital content that our users find compelling, and our business will suffer if we are unable to entertain our users, improve the experience of our users, or properly incentivize our developers and creators to develop content.***  \n&gt;  \n&gt;Our platform enables our developers to create experiences and virtual items, which we refer to as user generated content. Our platform relies on our developers to create experiences and virtual items on our platform for our users to acquire and/or use. Our users interact with these experiences, which are largely **free** to engage with.\n\nLargely free to engage with - this echoes my previous sentiment in which few players are really seen spending money on this game, at least compared to the amount who don\'t.\n\nSo, let\'s conclude. Where do we stand?\n\n* [Roblox has developers bringing in millions of dollars, with one of them reportedly bringing in $50 million as of 2020.](https://preview.redd.it/vzbeykpuffj61.png?width=1308&amp;format=png&amp;auto=webp&amp;s=fb544d5496049a3bd14bce8db3cad31e1896db6a)\n   * Another thing to note here that I thought was funny was the President mentioning his Roblox account being a company goal. Ambitious, or last-ditch effort at marketing?\n   * This is taken from the [2020 Roblox Developer\'s Conference.](https://blog.roblox.com/2020/07/rdc-2020-recap/)\n* [Roblox has an Amazon store selling merchandise and toys, the latter of which are also available in stores worldwide.](https://www.amazon.com/roblox?=&amp;_encoding=UTF8&amp;tag=r05d13-20&amp;linkCode=ur2&amp;linkId=5562fc29c05b45562a86358c198356eb&amp;camp=1789&amp;creative=9325&amp;productGridPageIndex=2)\n* Roblox runs its own data centers to deliver its platform.\n   * For the nine months ended September\xa030, 2019, direct infrastructure costs were $58.2\xa0million, or 13% of bookings, and grew by 66% to $96.8\xa0million, or 8% of bookings, in the nine months ended September\xa030, 2020.\n      * This data is taken from the S-1 form.\n\nSo, it\'s clear that the company is well cemented and has elements that should signify growth, but it\'s still unclear why such significant investment and presence in the online video game market has not yet translated to profit. This makes it difficult for me to make a bull case for $RBLX. The DPO on March 10th is expected to be around $45. Had this offering been in the single digits or teens, I believe a great long value play could\'ve been made. For me, $45 is a bit too rich for my blood for a company that is hinging on continued growth past the pandemic and a lot of other things going right that should have already happened in the last 16 years that they have been in business.\n\n**EDIT 2**: I feel the need to address a point brought up in [this comment chain, which is a great read and offers the developer insight I was looking for.](https://www.reddit.com/r/investing/comments/lrl4qf/thoughts_from_my_personal_experience_with_roblox/gooabb3/?context=3) A lot of Roblox\'s viability for future growth hinges on continuing to grow as a game engine &amp; platform rather than just a social platform for kids that happens to have games. Here are my thoughts on that, as things stand now.\n\nFor Roblox to be taken more seriously as a game engine, like Unity or Unreal, they need to continue to rebrand and move past the image of "kid\'s Lego game/Minecraft alternative", which will consume even more capital than they\'re already investing, not to mention the costs of simply going public. Their VC funding last year leading to their 30M valuation is 7x bigger than the funding of the previous year. Whether that symbolizes huge future growth or dumping the company on retail investors as a backup is up to your belief in the company.\n\nThe game engine uses Lua, and it has improved a lot in the past years. [Here are some images showing what\'s possible.](https://preview.redd.it/s5b0bxgqkfj61.png?width=690&amp;format=png&amp;auto=webp&amp;s=cc1447a41923c6ec69d9626a644d486c54c3af78) Looks great, right? Yes, but their target demographic doesn\'t have cutting-edge PCs that can run those graphics. From their S1:\n\n&gt;68% of our engagement hours on the platform were from users who signed up through the Apple App Store and Google Play Store.\n\nThese kids play on tablets, not supercomputers. That being said, any game developed in Roblox needs to work with the economy of the website. Unity and Unreal don\'t suffer from this, because they were never a social platform for kids. Whether or not that is a limiting factor to growth depends on the games developers want to make. You must also keep in mind that whatever is developed on Roblox needs to be gobbled up by an age group that is predominantly under 13. From their S1:\n\n&gt;For the nine months ended September 30, 2020, 54% of our users were under the age of 13.\n\nA lot of accounts could have fake birthdays, lending further to the idea of a young player base. As a result, complex games with cutting-edge industry tech isn\'t the cash cow here. This means that all the money invested in improving the platform might not result in tangible returns. Furthermore, the most popular games are cash grab "simulators" and carbon copies of games that already exist, sometimes with stolen assets. That presents a copyright problem if Roblox is to continue expanding.\n\nIf you got this far, thanks for reading, and best of luck in your future investments.'
""")


'You can look at their books, but I believe it’s difficult to make an informed decision about $RBLX without some experience with the website and games themselves. Here are some of my thoughts from being an active member since 2007. Feel free to ask me any questions you might have about the platform.

Roblox has been available to the public since 2006; you’re not investing in a new up &amp; coming company. It’s been around for 16 years, which is very long for an online game. That in itself is a good sign, but its potential and success is pretty cemented as of now. The lack of worldwide reach (with the majority of users being from North America) is worrisome, but there may be room for growth.

They are currently sitting at **2.4 billion** registered users. The daily active user (DAU) count is somewhere around **30 million** (see S-1 form in my edit).  Thus, the vast majority of registered users are inactive users or bots who prey on dumb kids to steal their account info via cookies. Thi

In [25]:
txt = """
I would like to get your all  thoughts on the bond yield increase this week.  I am not worried about the market downturn but the sudden increase in yields. On 2/16 the 10 year bonds yields increased by almost  9 percent and on 2/19 the yield increased by almost 5 percent.

Key Points from the CNBC Article:

* **The “taper tantrum” in 2013 was a sudden spike in Treasury yields due to market panic after the Federal Reserve announced that it would begin tapering its quantitative easing program.**
* **Major central banks around the world have cut interest rates to historic lows and launched unprecedented quantities of asset purchases in a bid to shore up the economy throughout the pandemic.**
* **However, the recent rise in yields suggests that some investors are starting to anticipate a tightening of policy sooner than anticipated to accommodate a potential rise in inflation.**

The recent rise in bond yields and U.S. inflation expectations has some investors wary that a repeat of the 2013 “taper tantrum” could be on the horizon.

The benchmark U.S. 10-year Treasury note climbed above 1.3% for the first time since February 2020 earlier this week, while the 30-year bond also hit its highest level for a year. Yields move inversely to bond prices.

Yields tend to rise in lockstep with inflation expectations, which have reached their highest levels in a decade in the U.S., powered by increased prospects of a large fiscal stimulus package, progress on vaccine rollouts and pent-up consumer demand.

The “taper tantrum” in 2013 was a sudden spike in Treasury yields due to market panic after the Federal Reserve announced that it would begin tapering its quantitative easing program.

Major central banks around the world have cut interest rates to historic lows and launched unprecedented quantities of asset purchases in a bid to shore up the economy throughout the pandemic. The Fed and others have maintained supportive tones in recent policy meetings, vowing to keep financial conditions loose as the global economy looks to emerge from the Covid-19 pandemic.

However, the recent rise in yields suggests that some investors are starting to anticipate a tightening of policy sooner than anticipated to accommodate a potential rise in inflation.

With central bank support removed, bonds usually fall in price which sends yields higher. This can also spill over into stock markets as higher interest rates means more debt servicing for firms, causing traders to reassess the investing environment.

“The supportive stance from policymakers will likely remain in place until the vaccines have paved a way to some return to normality,” said Shane Balkham, chief investment officer at Beaufort Investment, in a research note this week.

“However, there will be a risk of another ‘taper tantrum’ similar to the one we witnessed in 2013, and this is our main focus for 2021,” Balkham projected, should policymakers begin to unwind this stimulus.

Long-term bond yields in Japan and Europe followed U.S. Treasurys higher toward the end of the week as bondholders shifted their portfolios.

“The fear is that these assets are priced to perfection when the ECB and Fed might eventually taper,” said Sebastien Galy, senior macro strategist at Nordea Asset Management, in a research note entitled “Little taper tantrum.”

“The odds of tapering are helped in the United States by better retail sales after four months of disappointment and the expectation of large issuance from the $1.9 trillion fiscal package.”

Galy suggested the Fed would likely extend the duration on its asset purchases, moderating the upward momentum in inflation.

“Equity markets have reacted negatively to higher yield as it offers an alternative to the dividend yield and a higher discount to long-term cash flows, making them focus more on medium-term growth such as cyclicals” he said. Cyclicals are stocks whose performance tends to align with economic cycles.

Galy expects this process to be more marked in the second half of the year when economic growth picks up, increasing the potential for tapering.

## Tapering in the U.S., but not Europe

Allianz CEO Oliver Bäte told CNBC on Friday that there was a geographical divergence in how the German insurer is thinking about the prospect of interest rate hikes.

“One is Europe, where we continue to have financial repression, where the ECB continues to buy up to the max in order to minimize spreads between the north and the south — the strong balance sheets and the weak ones — and at some point somebody will have to pay the price for that, but in the short term I don’t see any spike in interest rates,” Bäte said, adding that the situation is different stateside.

“Because of the massive programs that have happened, the stimulus that is happening, the dollar being the world’s reserve currency, there is clearly a trend to stoke inflation and it is going to come. Again, I don’t know when and how, but the interest rates have been steepening and they should be steepening further.”

## Rising yields a ‘normal feature’

However, not all analysts are convinced that the rise in bond yields is material for markets. In a note Friday, Barclays Head of European Equity Strategy Emmanuel Cau suggested that rising bond yields were overdue, as they had been lagging the improving macroeconomic outlook for the second half of 2021, and said they were a “normal feature” of economic recovery.

“With the key drivers of inflation pointing up, the prospect of even more fiscal stimulus in the U.S. and pent up demand propelled by high excess savings, it seems right for bond yields to catch-up with other more advanced reflation trades,” Cau said, adding that central banks remain “firmly on hold” given the balance of risks.

He argued that the steepening yield curve is “typical at the early stages of the cycle,” and that so long as vaccine rollouts are successful, growth continues to tick upward and central banks remain cautious, reflationary moves across asset classes look “justified” and equities should be able to withstand higher rates.

“Of course, after the strong move of the last few weeks, equities could mark a pause as many sectors that have rallied with yields look overbought, like commodities and banks,” Cau said.

“But at this stage, we think rising yields are more a confirmation of the equity bull market than a threat, so dips should continue to be bought.”
"""

tokens = tokenizer.encode_plus(txt, add_special_tokens=False)

len(tokens['input_ids'])

1345

If we tokenize this longer piece of text we get a total of **1345** tokens, far too many to fit into our BERT model containing a maximum limit of 512 tokens. We will need to split this text into chunks of 512 tokens at a time, and calculate our sentiment probabilities for each chunk seperately.

Because we are taking this slightly different approach, we have encoded our tokens using a different set of parameters to what we have used before. This time, we:

* Avoided adding special tokens `add_special_tokens=False` because this will add *[CLS]* and *[SEP]* tokens to the start and end of the full tokenized tensor of length **1345**, we will instead add them manually later.

* We will not specify `max_length`, `truncation`, or `padding` parameters (as we do not use any of them here).

* We will return standard Python *lists* rather than tensors by not specifying `return_tensors` (it will return lists by default). This will make the following logic steps easier to follow - but we will rewrite them using PyTorch code in the next section.

In [24]:
type(tokens['input_ids'])

list

First, we break our tokenized dictionary into `input_ids` and `attention_mask` variables.

In [26]:
input_ids = tokens['input_ids']
attention_mask = tokens['attention_mask']

We can now access slices of these lists like so:

In [27]:
input_ids[16:32]

[1045,
 2572,
 2025,
 5191,
 2055,
 1996,
 3006,
 2091,
 22299,
 2021,
 1996,
 5573,
 3623,
 1999,
 16189,
 1012]

We will be using this to break our lists into smaller sections, let's test it in a simple loop.

In [None]:
# define our starting position (0) and window size (number of tokens in each chunk)
start = 0
window_size = 512

# get the total length of our tokens
total_len = len(input_ids)

# initialize condition for our while loop to run
loop = True

# loop through and print out start/end positions
while loop:
    # the end position is simply the start + window_size
    end = start + window_size
    # if the end position is greater than the total length, make this our final iteration
    if end >= total_len:
        loop = False
    print(f"{start=}\n{end=}")