<a href="https://colab.research.google.com/github/flora0110/spotify_segment_headine/blob/main/perplexity_podcast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Transformers installation
! pip install transformers datasets
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 5.3 MB/s 
[?25hCollecting datasets
  Downloading datasets-2.0.0-py3-none-any.whl (325 kB)
[K     |████████████████████████████████| 325 kB 41.5 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 32.2 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 683 kB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 36.5 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 33.3 

# Perplexity of fixed-length models

Perplexity (PPL) is one of the most common metrics for evaluating language models. Before diving in, we should note
that the metric applies specifically to classical language models (sometimes called autoregressive or causal language
models) and is not well defined for masked language models like BERT (see [summary of the models](https://huggingface.co/docs/transformers/main/en/model_summary)).

Perplexity is defined as the exponentiated average negative log-likelihood of a sequence. If we have a tokenized
sequence $X = (x_0, x_1, \dots, x_t)$, then the perplexity of $X$ is,

$$\text{PPL}(X) = \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right\}$$

where $\log p_\theta (x_i|x_{<i})$ is the log-likelihood of the ith token conditioned on the preceding tokens $x_{<i}$ according to our model. Intuitively, it can be thought of as an evaluation of the model's ability to predict uniformly among the set of specified tokens in a corpus. Importantly, this means that the tokenization procedure has a direct impact on a model's perplexity which should always be taken into consideration when comparing different models.

This is also equivalent to the exponentiation of the cross-entropy between the data and model predictions. For more
intuition about perplexity and its relationship to Bits Per Character (BPC) and data compression, check out this
[fantastic blog post on The Gradient](https://thegradient.pub/understanding-evaluation-metrics-for-language-models/).

## Calculating PPL with fixed-length models

If we weren't limited by a model's context size, we would evaluate the model's perplexity by autoregressively
factorizing a sequence and conditioning on the entire preceding subsequence at each step, as shown below.

<img width="600" alt="Full decomposition of a sequence with unlimited context length" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ppl_full.gif"/>

When working with approximate models, however, we typically have a constraint on the number of tokens the model can
process. The largest version of [GPT-2](https://huggingface.co/docs/transformers/main/en/model_doc/gpt2), for example, has a fixed length of 1024 tokens, so we
cannot calculate $p_\theta(x_t|x_{<t})$ directly when $t$ is greater than 1024.

Instead, the sequence is typically broken into subsequences equal to the model's maximum input size. If a model's max
input size is $k$, we then approximate the likelihood of a token $x_t$ by conditioning only on the
$k-1$ tokens that precede it rather than the entire context. When evaluating the model's perplexity of a
sequence, a tempting but suboptimal approach is to break the sequence into disjoint chunks and add up the decomposed
log-likelihoods of each segment independently.

<img width="600" alt="Suboptimal PPL not taking advantage of full available context" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ppl_chunked.gif"/>

This is quick to compute since the perplexity of each segment can be computed in one forward pass, but serves as a poor
approximation of the fully-factorized perplexity and will typically yield a higher (worse) PPL because the model will
have less context at most of the prediction steps.

Instead, the PPL of fixed-length models should be evaluated with a sliding-window strategy. This involves repeatedly
sliding the context window so that the model has more context when making each prediction.

<img width="600" alt="Sliding window PPL taking advantage of all available context" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ppl_sliding.gif"/>

This is a closer approximation to the true decomposition of the sequence probability and will typically yield a more
favorable score. The downside is that it requires a separate forward pass for each token in the corpus. A good
practical compromise is to employ a strided sliding window, moving the context by larger strides rather than sliding by
1 token a time. This allows computation to proceed much faster while still giving the model a large context to make
predictions at each step.

## Example: Calculating perplexity with GPT-2 in 🤗 Transformers

Let's demonstrate this process with GPT-2.

In [2]:
from transformers import GPT2LMHeadModel, GPT2TokenizerFast

device = "cuda"
model_id = "gpt2-large"
model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)

Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.02G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

We'll load in the WikiText-2 dataset and evaluate the perplexity using a few different sliding-window strategies. Since
this dataset is small and we're just doing one forward pass over the set, we can just load and encode the entire
dataset in memory.

In [3]:
from datasets import load_dataset

test = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
encodings = tokenizer("\n\n".join(test["text"]), return_tensors="pt")

Downloading builder script:   0%|          | 0.00/2.03k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.90 MiB, post-processed: Unknown size, total: 17.40 MiB) to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126...


Downloading data:   0%|          | 0.00/4.72M [00:00<?, ?B/s]

Generating test split:   0%|          | 0/4358 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/36718 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3760 [00:00<?, ? examples/s]

Dataset wikitext downloaded and prepared to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126. Subsequent calls will reuse this data.


Token indices sequence length is longer than the specified maximum sequence length for this model (287644 > 1024). Running this sequence through the model will result in indexing errors


In [4]:

podcast_test = ['Hi friends. ', "Happy Wednesday or Thursday if you're watching over on YouTube. ", 'Hey, how are you doing? ', "I hope you're having a wonderful day so far. ", 'My name is Bailey serían and this is the dark History Podcast. ', 'Whew, I need a theme song for dark history. ', "Don't, I'm anywho, I'm here to talk about some dark ass history that we never learned in school or I should, at least say that. ", 'I never learned in school. ', "If you're new here. ", 'Hi, welcome. ', 'Come. ', 'Take a seat by me. ', "Don't be shy if you're And learning about people and events that your history books didn't cover in school. ", "The maybe should have then you'll fit in right in here with us. ", 'Okay. ', "Look today's story. ", "It's uncomfortable. ", "Okay, it's heartbreaking. ", "It's just, it's set. ", "It's awful. ", "It's sad. ", "There's, those are all the words. ", "I can think of it, just doesn't make any sense. ", "But with all of that being said, there's really like no sugarcoating this. ", 'This is what happened. ', "And it's when more honest and when we're transparent that we can learn and grow together, you know,but most of all not let history repeat itself. ", "So let's set the scene. ", "It's New Years Day 1923 and many people are waking up from celebrating the New Year. ", 'Some of them are still wearing a flapper dress or a suit tired from dancing. ', "The night away in a jazz club, or maybe they're hung over from buying illegal drinks at a speakeasy. ", 'Yes. ', 'Oh, yes. ', 'It was still illegal to buy alcohol wild, right? ', 'Of course, though. ', 'That was more of like the upscale City Vibe. ', 'If you were in the country had little less. ', 'Pocket change to spend or were to straightedge to visit, like those kind of illegal joints. ', 'Then you probably had a quieter evening of staying at home and ringing in the new year with with your loved ones, you know, now 1923 in general President Warren G, Harding would die in office, the Yankee Stadium would be built in the Bronx and the original Hollywood sign would be built. ', 'Oh, yeah, Hollywood was originally called Hollywood land, but we could say that for another episode, babe. ','Before all of that, on the morning of January, first, nineteen, twenty three thirty year old James and his 22-year old wife, Fanny, Taylor woke up at a home and a town called Sumner, a neighboring community of Rosewood, Florida, that morning. ', 'James Taylor goes to work like he did any other day. ', 'But while James was at work, something happens to his wife Fanny. ', 'That boy, it just triggers a domino effect, a horrific domino effect. ', 'It leads to the Murders of innocent people, the destruction of a town and the displacement of an entire Community changing Rosewood forever who just sounds traumatic. ','I know because it is, welcome. ', 'Hi. ', 'This episode friends were going to be talking about the Rosewood Massacre. ', 'Have you heard of it? ', 'Yeah, exactly. ', 'Exactly. ', "I didn't this one, just it completely blew. ", 'My mind is like why in the Waldo? ', 'Did we not learned about this in school? ', 'Correction? ', 'I can only speak for myself here. ', 'So, why in the Waldo did my history books? ', 'Not teach this Rosewood Massacre? ', 'Yeah. ', 'It happened. ', "Let's discuss and learn together. ", 'What took place in 1923, Rosewood, Florida. ', 'Buckling, kitty cats. ', 'This is just what? ', "Yeah, I'm great with words. ", 'Obviously. ', "That's why I started a podcast. ", 'The beginning, great place. ', 'Start the town of Rosewood was established in the mid eighteen. ', 'Hundreds white families, had moved here prior to the Civil War and black landowner started to move in in the 1870s. ', "It's located in the State of Florida and it's just nine miles east of the Gulf of Mexico. ", 'The name Rosewood. ', 'It came from the red cedar trees that were all over the land there. ', 'Fun fact, alert Boo. ', 'Boo. ', 'Boo. ', 'Did you know that when you cut open a red cedar tree, the cedar has a pale pink color inside? ', "Yeah, I guess it's like super pretty. ", 'Super beautiful. ', 'But did you also know that if you cut me open? ', "I've got a little pink inside a LOL. ", 'Anyways, so they called the town Rosewood after the pink color. ', 'Okay. ', 'Nice, right. ', "Well, as we've learned through history, we can't have nice things. ", 'So when people saw the really pretty color that was inside of these beautiful incredible trees. ', 'What do you think? ', 'They did? ', 'Yes, they cut them all down. ', 'They cut them down and they sold the A now because of this they made a pretty decent amount of money and it supported the economy of the town but still, you know sad for the trees by the year 1886 the town had developed quite a bit. ', 'It had its own post office, a Schoolhouse and they were three churches one for the white people into for the black people. ', "Remember, we're talking about a time that's very much a segregated America. ", 'So four years later in 1890 the town began to hit some hard times. ', 'All the red cedar trees had been chopped down and without the trees. ', "There's no money to be made which in turn leads to the town's economic decline because of the loss of income and not many available jobs, many members of the white Community left, Rosewood, and headed to the nearby town of Sumner Sumner was appealing because there was a large Sawmill which provided tons of jobs and right now the people needed work, so, sign me up for Sumner, you know, so by the 1900. ", 'The Rosewood population. ','Have now shifted to a majority black community. ', 'Now, with, in Rosewood. ', 'There was a really close sense of community. ', 'Like, everyone seemed to know one another and looked out for one another as well. ', 'I mean, rarely did anyone need to leave town like to travel for food or for things? ', 'They may need because they had almost everything right there with in Rosewood, in addition to the three churches Schoolhouse and post office Rosewood also had a train station. ', 'In which will become important later a general store, and a Sugar Mill. ', 'Plus, if you were looking for some fun, they even had a Rosewood baseball team. ', "What I'm getting at is, Rosewood, was overall, just a nice place. ", "So, for over 20 years, the people of rosewood's stayed with in Rosewood and the people of Sumner, the neighboring town. ", 'They stayed within Sumner, they coexisted. ', "They had their own Communities going on and they were just doing their own thing, but unfortunately, that peaceful time was about To end because some shit was a bruin, but before we get into, what was a bruin, let's pause for a word from our sponsor. ", 'Things are kind of getting back to normal, I guess, right? ', 'Yeah, but I swear to you over quarantine. ', 'My brain has felt like mush and need some exercise. ', 'I mean, it is a muscle, right, you know, any who would best fiends. ', 'It feels like my brain. ', 'Got a great workout. ', "What's best fiends. ", 'You ask? ', "Well, it's the mobile puzzle game that will put A challenge to your Noggin, and it's kind of like you did a workout. ", 'But again, for your brain, not only that, best fiends is so fun. ', "You won't want to put it down + best. ", 'Fiends has literally thousands of fun, puzzles to solve. ',"I'm currently on level 50 with like a lot more to go. ", "So, there's really nothing to brag about their. ", "There's literally something new to play every day. ", 'And the characters are just so adorable. ', 'I want to be best fiends with them. ', "Best means it's constantly putting out updates. ", "So there's always something new. ", "You exciting to explore, whenever I'm feeling a little bored or I have like, you know some time to kill between these Zoom meetings. ", 'I just bust out my phone and go straight to best fiends to tackle. ', 'Some puzzles. ', 'Download the five-star rated puzzle game. ', 'Best fiends free today on the app store or Google Play. ', "That's best friends without the our best fiends. ", 'Thanks. ', "Best beans for partnering with me on today's episode. ", 'Okay. ', "Remember, it's January 1st. ", '1923 our young couple James. ', 'And Fanny Taylor who are white are waking up on the first day of the new year. ','And James was like, goodbye Fanny. ', "Like I'm off to work. ", 'I love you. ', 'Babe. ', 'You know and Fanny is like, oh my God. ', 'Yeah. ', "Have a good day whenever that's probably what they did. ", 'Anyway, so James leaves his wife Fanny home alone because he has to go to work. ', 'He works at The Sawmill in Sumner which was about one mile east of Rosewood. ', 'Now while James is at work, Milling saws Fanny whose back at home. ', 'Rumor had it. ', 'Someone had attacked her. ', 'Oh, yeah. ', 'Oh, yeah, the neighbors heard screams. ', "They heard screams coming from within the Taylor's home and they were like, oh my God, you know, like what's going on screaming? ", 'Hmm. ', "So for you and I, we'd probably go and like check it out, right would be like, oh my God, they're screaming. ", "Are you okay, but for Fanny's neighbors, they heard the screams and then they just left it alone. ", 'Great. ', 'Yeah, awesome. ', 'So some time goes by and James comes home from Mark, and when he gets home, he sees that fanny has a big bruise on her face. ',"It's clear that somebody had hit her pretty badly. ", 'So he asked her, you know, who did this to you and Fanny tells him that a black man assaulted her. ', "She's like, I don't know who it was, but it was definitely a black man. ", 'According to Fanny. ', 'Hmm. ', 'I mean, this could be possible. ', 'This guy came in randomly, beat her, I guess in her own house, but remember, Like this is a small town. ', 'In fact, both Sumner and Rosewood were so small that most people recognize one another. ', 'So Are you sure we honey? ', "Are you sure that's what you saw Fanny? ", 'The fact that Fanny like, could not name the person who attacked her. ', 'That should have been the first red flag. ', 'Okay. ', 'That maybe this person that quote unquote attacked her was not a resident from Rosewood. ', "I'm telling you, I'm not kidding you. ", 'If you lived in Rosewood, you would know your neighbors everybody knew each other. ', 'But Fanny would most likely know this person because again, the likelihood of her not knowing is presumptive, not? ', 'So sure Fanny sure. ', "Again, because it's a 1920s. ", 'Unfortunately, this is a time when a white woman made, bold claims of being attacked by a black person. ', 'Nobody questioned it. ', 'They just believed it to be true. ', "So they didn't question Fanny story for one second. ", 'Like, oh my God. ', 'She was beaten by a block man. ', "Like we are so angry and nobody wanted to like double check and make sure she what you she wasn't lying. ", 'They just just went with it. ', 'People are talking with in Rose Water. ',"Okay, and this story about Fanny gets back to a woman by the name of Sarah carrier and she's like, wait a minute. ", 'Wait a minute. ', ' ', ' ',"That's not true. ", "Sarah said that she was at the Taylor's Home, the morning of that so-called attack, and not just that, but her granddaughter was there with her and neither of them saw black man, attack Fanny. ", 'Alone. ', 'Even enter their house. ', "Well, you're probably wondering really Who the hell's Sarah? ", 'Okay. ', "Well Sarah, she had been working for the Tailor's for quite some time. ", "She was there that day with her granddaughter doing the Taylor's laundry. ", 'Now, Sarah said that she actually saw a white man leaving the Taylor house that morning. ', 'She said, she never saw a black man. ', 'Come by the house all day. ', 'Now, get this. ', 'Get this. ', 'Sarah recognize this white man. ', 'She had seen him come by the house once or twice before. ', 'Hmm. ', 'Sarah had believed that this man and Fanny were having some kind of Winky dinky. ', "If you're not on me. ", "Sarah said that this man was actually the guy that beat Fanny that morning and that she was just lying to cover her ass because she's probably having an affair. ", "And she doesn't want people to know Fanny. ", "So Sarah's version is now going around town. ", "It's spreading like wildfire within the black community. ", 'They all 100% believed her. ', 'Why would she lie? ', 'Whoa, why would she lie, unfortunately, though the white voices were louder within the community and made it very clear. ', 'That Fanny was quote. ', 'Not a liar. ', 'She was telling the truth, okay. ','I freaking roll, right? ', "It gets a lot more traumatic talking amongst the accusers and name comes up, Jesse Hunter, and they think they're disgusting. ", "They're like, hey, Jesse Hunter. ", 'This is our guy. ', 'This is the guy who did it. ', "Who's Jesse Hunter? ", 'Okay. ', 'Now Jesse Hunter was a black man who recently escaped from the nearby prison. ', "Now because Jesse was a black man who recently escaped from prison while they're believing. ",'He must have done. ', 'It must be the same black man. ', 'That attacked Fanny. ', 'No, Sarcasm mind you. ', "There was absolutely 100% zero evidence that Jesse was the perpetrator, but that doesn't matter to them. ", "They've got their minds set on Jesse Hunter and they go out looking for him while out, looking for him, the county, sheriff, Robert Walker, he joins in on the search. ", 'Now, this County Sheriff, was able to get a bunch of other white men together to form. ', 'What was called a posse to help with the search. ', "Now, this isn't like your normal Posse today, like a Up of friends hanging out back. ", 'Then a posse was a more formal thing. ', 'It was an official group of men that a sheriff could organize in case of an emergency and this to them at the time. ', 'Well, this was that kind of Emergency. ', 'The sheriff would pull in some guys, from around town. ', "I don't know how he picked them. ", "But he did and he made them his temporary deputies, which is wild to think, just ran it like, hey, you your Deputy now like That's you were allowed to do that. ", 'So the county sheriff got the Posse together and they start searching for Jesse. ', 'Now, they were determined to catch him and punish him for his alleged attack towards Fanny. ', 'The Posse even went down to the local prison to borrow a pack of bloodhounds to track Jesse sent. ', 'I mean, they were seeing red. ', 'No common sense, going on here. ', 'The Posse, they hear that. ', 'Jesse was last seen with another black man named Sam. ', 'Carter. ', 'Who worked? ', 'As the local blacksmith. ', "Now this alleged sighting has never been confirmed but they don't care. ", 'Now remember again. ', "I've said it 100 times. ", "It's a small town. ", 'So they know exactly where Sam Carter lives and they had straight to his house. ', 'They barged in. ', 'Okay, and they interrogated. ', 'Sam and Iceland. ', "Where's Jesse? ", "And they're demanding. ", "And they're like, where's Jesse? ", 'We need to know where he is right now. ', 'Where is he going? ', "Where's Jesse? ", "But Sam, he couldn't answer their questions because Yeah, he didn't have the answers. ", "You know, he doesn't know. ", "He has no idea what's going on. ", 'Of course. ', "The Posse doesn't believe him. ", 'They were convinced that Sam, he knew something. ', 'Okay, because they are convinced of this, they get swept up in what becomes this angry mob mentality. ', 'So, they force a man of the house. ', 'The kidnap him. ', 'The posses thought process was that if they torture, Sam like they can probably get some answers as to where Jesse is, so, The kidnaps him, they hung him from a tree by his neck, pushing Sam, to tell the quote-unquote truth. ', 'Hmm, Sam. ', 'He was telling the truth the whole time. ', "He didn't know where Jesse was but that's not acceptable to the Posse. ", "They're hell-bent on their mission of quote-unquote Justice. ", 'They shot him to death and left his body in the road between Rosewood and Sumner Sam. ', 'Carter was only 45 years. ','Sold. ', 'This is just the tip of the iceberg. ', 'So by that night, the original Posse, which was, you know, it was just a small at first. ', 'Oh, babe babe. ', 'We had now, snowballed into a freaking mob. ', 'Oh, yeah, people were talking and the neighboring towns got word about like what had taken place and that this Manhunt for Jesse was going down. ', 'So a bunch of rando white men. ', 'Come out of the freaking would work. ', "It was like we're we're You guys come from. ", 'They just showed up. ', "There was no official count of how many men joined this Riot, but that's exactly what it turns into a freaking Riot. ", 'And this is when the massacre begins. ', 'I know this is a little heavy. ', "We're just going to take a little moment here, break for a sponsor that helps keep dark history going. ", "Get mouth-watering, seasonal, recipes, and fresh, pre-measured ingredients delivered right to your door with hello fresh, which is America's number one, meal Kit. ", 'Hello fresh makes cooking at home, fun easy and affordable. ', 'They offer 50 menu and Market items each week, including ready to like eat salads sandwiches and soups. ', "There's something for everyone to enjoy. ", 'I love soup, clam chowder. ', 'Sign me up. ', 'Just love it. ', "Anyways, all these recipes are designed and tested by professional chefs and nutritional experts to make sure that not only is it super delicious, but it's also Oh simple and just good for you. ", 'Hello fresh. ', 'They cut out that stressful meal planning and prepping that you have to do. ', 'So you can just get back to enjoying cooking and getting dinner on the table and just about 30 minutes. ', 'They even have a 20-minute offer. ', "It's like a quick and easy option. ", "So if you're in a rush, you only got 20 minutes. ", 'Hello fresh. ', 'Got you covered. ', "I love hello fresh because it's just really easy. ", 'Honestly, it breaks down the recipe step by step. ', 'So you cannot mess it up and let me tell you, I usually mess everything up. ', 'Usually I start kitchen fires, but with hellofresh, no, kitchen fires. ', "It's so easy. ", 'I can do it. ', "And that's saying a lot. ", "If you're interested, I would suggest you go to hellofresh.com /, dark history, 14 and use my code dark history. ", '14 for up to 14 free meals, including free shipping. ', 'Thank you. ', "Hello, fresh for partnering with me on today's episode. ", 'The death of poor Sam. ', 'Carter was the Catalyst for what was about to become. ', 'The Rosewood Massacre. ', 'The Posse is still out searching for Jesse Hunter. ', 'Remember, the prison escapee who the mob believed beat Fanny. ', "But now, but now the Posse searching for Jesse, it's it's grown bigger and bigger. ", 'And more white men are joining this mob from all over Florida, because his mom had grown to, like such a big, a big site members of the community families, both black and white. ', 'You could sense that things were getting out of control. ', 'Violence was a bruin and this no longer feels like it was just a search for Jesse 9A. ', 'It was turning more aggressive. ','Angry hate-filled, as this white mob was Now, using this as an excuse to destroy everything in sight. ', "So the black community is terrified by what they're seeing happening in their own town of Rosewood. ", 'They thought that maybe if they got together in a large group, And like some of their homes, they would be safety in large numbers protecting them. ', "So they did that a couple of days go by and on day three, there's another rumor spreading around that. ", 'Jesse was actually hiding in one of the homes that they missed back in Rosewood and the home was none. ', 'Other than Sarah carriers house to remember, Sarah carrier from earlier. ', "She's the one that who did the laundry for the tailors who told the truth? ", 'Who had the L story, Sarah. ', 'Yeah, Sarah, son Sylvester carrier. ', 'He also lived in the house with her. ', 'Now. ', "There was some tension between Sylvester and the people of the white Community because he was known for speaking up for himself and protecting the women in his family, which I know what's wrong with that. ", "There's absolutely nothing good for him. ", 'Good for him. ', 'I know. ', 'I agree with you. ', 'But again, this is the 1920s. ', "It wasn't smart, won't go smart. ", 'Just stand up for yourself or others towards white person, unless you wanted some issues. ', "So with all that being said, it's kind of not surprising when the Posse looked at Sylvester as their next Target. ", 'Now, in addition to this, angry mob coming their way Sylvester, and Sarah also had another problem. ', 'They were keeping a lot of children from the community within their home. ', "Although the exact number wasn't known. ", "You see their home was close to a swamp which in turn made a great location to hide because it provided cover and it wasn't easy to walk around in. ", 'Making it hard to be followed or even get caught. ', 'Unfortunately, the swamp would come to be their best protection on the fourth day of this Riot. ', 'January 4th. ', 'The white mob is headed for the carrier home. ', "It's a Wester wasn't going to just hide and do nothing. ", 'He was going to defend himself his family, the children. ', "In his property and it was about to get violent when the mob gets to the carrier's house. ", 'The first thing they do. ', "I'm sorry, but they shoot the carrier's, dog. ", 'I know, they just like freaking, shoot, the dog. ', 'One of the guys in the mob. ', 'He calls out to Sylvester. ', "He's like, hey, come out here and present yourself that come face us. ", "So some time goes by and they're getting no response. ", "So now they're frustrated. ", "They're pissed off to of the white men from the They decide to walk up onto the porch and start kicking down the front door. ", 'These two men were Henry, Andrews, and Polly Wilkerson. ', "And it's Wilkerson who ends up kicking in the front door. ", 'So, inside of the home was Sylvester and Sarah amongst the children. ', 'There was a nine-year-old girl named Minnie Lee Mitchell Langley. ', 'She said that Sylvester saved her that day many recalled when Sylvester put her in a safe spot under the stairway while They got ready to fight. ', 'She also said that he got behind her. ', 'At one point, put a gun on her shoulder pointing it at the front door and waited for poly Wilkerson to kick the door down and when Paulie finally kicked that door down Sylvester started shooting to defend themselves. ', 'They would end up. ', 'Killing both Andrews and Wilkerson. ', 'Great, you know, good. ', 'Can I see that? ', 'Why did, but listen once the mob realized that Andrews and Wilkerson were shot and killed. ', 'It was not good. ', 'They were raging. ', 'Okay, they were livid, they were fuming and a gunfight then ensued. ', 'Sadly, Sarah carrier, and her son Sylvester in the middle of this. ', 'Crossfire were both, tragically murdered. ', "The mob keeps shooting and they don't freaking Who they're hitting the only reason they eventually stopped firing was because they ran out of bullets. ", 'Great. ', "Now, we don't know how many men women and children hiding in the house was Sarah and Sylvester survived if any but what we do know is that the ones who were able to sneak out of the house they went and they hid in the swamp they escaped. ", "But not before they saw things that they wouldn't ever forget. ", 'One of the children are net Goings. ', 'Has spoke later about their struggle to escape going. ', 'Said, he remembers some staying out for two or three long January nights. ', 'In the cold swamp, scared for their lives. ', 'But after days of waiting, they were able to escape the mob. ', 'Now, the guys in the mob are pumped with adrenaline and anger and they are ready to cause more destruction. ', 'They decided to set fire across several houses and a church on their way back home. ', 'So this was no longer about Fanny or I see this was just a mission to destroy any property owned by the black community in Rosewood and the surrounding areas with the mob, on the move, the black people living in Rosewood began to flee, the town in a hurry. ', 'They knew that the mob would be coming back. ', 'Wanting revenge for the murder of Andrews and Wilkerson. ', "They weren't going to just let that go so many head in the cover of the swamp. ", 'Like the kids did thinking it would be hard to find them or catch them in there. ', 'With in Rosewood lived, a man named John right now. ', 'John Wright was well known within the black community. ',"He was a white man, but he was friendly to his neighbor's. ", 'Again. ', 'This is a different time, period, right? ', 'And John, he wanted to help protect them in their time of need. ', "He's seeing out his window. ", "What's going down in Rosewood, and he wants to open up his store and help the community. ", "These are his neighbor's, his friends, his family. ", "He can't just sit back and watch it because John was white, the mob. ", 'Even think about burning down his house. ', 'No. ', 'No, they did not, they were like, nope. ', "He's white. ", "He's good, save his house, but burn all these over here. ",'There were also white families in Sumner who sheltered and protected, people who they knew often, people who worked for them Friday morning on January 5th, 1923. ', "It's now day. ", 'Five of the Rosewood Massacre things have been pretty bad so far, but shits about to get even more real. ', 'Two hundred armed White. ', "Men comes storming into Rosewood and they're coming from all over the state to be a part of this action with those two hundred armed white men. ",'Some are even members of the Klu Klux Klan. ', 'As you can, probably imagine. ', 'This is not good. ', 'This is very bad. ', 'Why are they even here? ', 'How did they get? ', "He's like, why are they here? ", 'What are they doing here? ', "There isn't even much information as to how these guys knew what was going on. ", 'But what we do know is that they showed up. ',"So the group, it's A lot larger, right? ", 'And they were causing way more damage and destruction. ', 'They burned more homes. ', 'They burn the second Church, the Masonic lodge and the schoolhouse. ', 'They even burned the baseball field to just everything complete and utter destruction. ', "Everything is gone except John Wright's house. ", "So, at this point, it's hard to know how many black residents are still left in their Town. ", 'Those who could they ran for the swamp? ', 'But only those who were too old or sick to run. ', 'They were trapped there, and the members of the mob shot them on site, such as Lexi Gordon. ', 'She was a Rosewood resident. ', 'She ran from her burning home that day trying to escape when she was shot in the back and killed, not just killed. ', 'She was murdered the same day. ', 'A man named Mingo Williams was also murdered. ', 'He was shot in the head. ', "He wasn't even from Rosewood. ", 'He was literally just in the wrong. ', 'Place at the wrong time Mingo was the seventh recorded person to die in Rosewood so far. ', "There's Sam, Carter Sylvester carrier his mother Sarah, Lexi garden and now Mingo Williams who are killed by the mob, murdered by the mob and then there's Henry Andrews and Polly Wilkerson who were killed while storming, the carrier house. ", "So in the early hours of day 6, it's around 4 a.m. ",'Some help. ', 'Finally comes for. ', 'Oh, those who were trying to escape destruction by this time, word got out about what was going on in Rosewood. ', 'And everybody was talking about it. ', 'Right? ', 'Brothers, John and William, Bryce also got word about what was taking place in Rosewood. ', 'I know, I know. ', 'I know. ', "I don't know. ", "What's a lot of names. ", "I'm throwing at you, but who are John and William Bryce? ",'Okay. ', "Well, they, they worked as train conductors who happen to pass through the Rosewood train station all of the time and the Rosewood train station was one of the Only things that wasn't burned down. ", 'So when they found out what was happening taken place, what was going down in Rosewood? ', 'They knew that they had to help out their community. ', 'So the brothers pulled their train into the station to allow women and children, who had been hiding in the swamp, hop on the train and get the heck out of Rosewood as fast as they could with a train full of people. ', 'The brace Brothers stopped in different cities along the Route, allowing the survivors to get off the train. ','Rain and attempt to begin their lives. ', 'Again and a new city. ', 'They had left everything behind they escaped near death, and now they were forced to live in fear and never return to their homes and Rosewood. ', 'Again, to this day, some of the victims descendants still live in the nearby towns that they went to like Gainesville Florida. ', "Now, let's take a little pause for a break Squarespace, empowers millions of dreamers makers and Jewelers by providing them with the tools. ", 'They need to bring their creative ideas with Squarespace. ',"It's an all-in-one platform where you can build a website, claim a domain sell products or even just Market your brand Squarespace, has made it easier than ever to establish your online presence. ", "It's the perfect option for people who are ready to take, you know that next step and like make their ideas come to life because you can personalize everything when it comes to your website, it can truly. ", 'Be a reflection of who you are and like what your brand stands for. ', "It's so easy to use. ", 'If you are like new to this at all. ', "It's not intimidating. ", "If this sounds up your alley, I would suggest you head to squarespace.com dark history for a free trial and when you're ready to launch your new website, use the offer code dark history to save, 10% off your first purchase of a website or domain big. ", 'Thank you to Squarespace. ', 'Hey, Squarespace on Sunday.\n']

sentence_test = ["I love you","I are go","who are you","Today's weather is very good","Today's wether is very good","shiu yu is the project name for solution challenge"]

In [5]:
origin = ['Welcome to the huberman Lab podcast, where we discuss science and science based tools for everyday life. ', "I'm Andrew huberman, and I'm a professor of neurobiology and Ophthalmology at Stanford school of medicine. ", 'Today. ', 'We are going to discuss sugar in particular how our nervous system regulates our sugar intake and are seeking of sugar. ', "We're also going to discuss how sugar regulates our nervous system and as you'll soon, learn sugar really impacts our brain and body by two main mechanisms. ", 'Isms, one of those mechanisms is based on the Sweet Taste of sugar which itself is rewarding. ', "Even if you're not much of a sweet tooth. ", 'I confess. ', "I'm not most people enjoy sweet tastes more than bitter tastes and the Sweet Taste of sugar. ", 'And its various forms is strongly reinforcing. ', 'Meaning it triggers the activation of neurons nerve cells in the brain and body that make us want to consume more of that sweet substance. ', 'Incidentally. ', 'Sweet taste also make us want to eat more of other substances. ', 'Senses as well. ', 'You may be familiar with that phenomenon. ', 'Now sugar also triggers mechanisms in the brain and body based on its nutritive content independent of its sweetness. ', 'What that means is that the actual caloric content and the way that Sugar interacts with your nervous system at a subconscious level without your awareness, also impacts your craving and seeking of sugar and other Foods today. ', "We are going to discuss what happens when you ingest sugar in terms of your body's reaction. ", 'Your brains reaction. ', "We are also going to talk about what happens when you don't ingest enough sugar, because it turns out sugar is such a powerful fuel for the brain that under conditions, where people don't ingest enough sugar or where their so-called blood glucose, which is basically a blood sugar of a particular form gets too low there neurons, don't function as well. ", 'That said, their conditions of very low blood sugar in which neurons can function even better. ', 'So today we are going to talk about the ins and outs the ups and downs of sugar. ', "As it relates to your nervous system and by the end of this episode, I'm confident that you have a much clearer picture as to how much sugar you should be ingesting, whether or not you should avoid sugars that you're currently eating, and you will certainly understand much more about the energy and fuel sources that your brain relies on which I'm certain, will allow you to make better informed choices about the foods, you eat, and avoid toward mental health, physical health, and performance. ", "I'm pleased to announce that I'm hosting to Live Events. ",'This may the first live event will Hosted in Seattle, Washington on May 17th. ', 'The second live event will be hosted in Portland, Oregon on May 18th. ', 'Both are part of a lecture series entitled, the brain-body contract during which I will discuss science and science based tools for mental health, physical, health, and performance. ', "I should point out that while some of the material, I'll cover will overlap with information covered here on the huberman Lab podcast and on various social media posts. ", 'Most of the information I will cover is going to be distinct from information covered on the podcast. ', 'Or elsewhere. ', "So once again, it's Seattle on May 17th. ", 'Portland on May 18th. ','You can access tickets by going to human lab.com, / tour and I hope to see you there. ', "Before we begin, I'd like to emphasize that this podcast is separate from my teaching and research roles at Stanford. ", 'It is however, a part of my desire and effort to bring zero cost to Consumer information about science and science related tools to the general public in keeping with that theme. ', "I'd like to thank the sponsors of today's podcast. ", 'Our first sponsor is thesis thesis makes what are called? ', 'Nootropics, which means smart drugs. ', 'Now, to be honest. ', 'I am not a fan of the term nootropics. ', "I don't believe in smart drugs. ", "In the sense that I don't believe that there's any one substance, or collection of substances that can make us smarter. ", 'I do believe based on science. ', 'However, that there are particular, neural circuits and brain functions that allow us to be more focused, more alert access creativity, be more motivated at cetera. ', "That's just the way that the brain works different, neural circuits for different brain States. ", "And so the idea of a nootropic that's just going to make a Smarter all around fails to acknowledge that. ", 'Smarter is many things, right? ', "If you're an artist or a musician, you're doing math, you're doing accounting a different part of the day. ", 'You need to be creative. ', 'These are all different brain, processes thesis understands this. ', 'And as far as I know that the first nootropics company to create targeted, nootropics for specific outcomes. ', 'They only use the highest quality ingredients, which of course is essential. ', "Some of those I've talked about on the podcast, things like DHA, ginkgo biloba, phosphatidyl serine. ", 'They give you the ability to try several different blends over the course of a month. ', 'Discover which nootropics work best for your unique brain, chemistry and genetics and goals. ', "And with that personalization design, a kit of nootropics that's ideal for the different brain and body States. ", 'You want to access. ', "I've been using thesis for more than six months now and I can confidently say that their nootropics have been a total Game Changer. ", "My go-to formula is the clarity formula or sometimes I'll use their energy formula before training to get your own personalized. ", 'Nootropic starter kit. ', 'Go online to take thesis.com huberman, take a A minute quiz and thesis will send you four different formulas to try in your first month. ', "That's take thesis.com huberman and use the code huberman at checkout for 10% off. ", 'Your first order. ', "Today's episode is also brought To Us by athletic greens now called a G1. ", 'Athletic greens is an all-in-one vitamin mineral, probiotic drink. ', "I've been taking athletic greens since 2012. ", "So I'm delighted that they're sponsoring the podcast. ", 'The reason I started taking athletic Greens in the reason. ', 'I still take out letter greens once or twice a day, is that it helps me, cover all of my bases. ', 'Nutritional needs. ', 'It makes up for any deficiencies that I might have. ', 'In addition. ', 'It has probiotics which are vital for microbiome Health. ', "I've done a couple of episodes now on the so-called gut microbiome and the ways in which the microbiome interacts, with your immune system, with your brain to regulate mood and essentially with every biological system relevant to health throughout your brain and body without let it greens. ", 'I get the vitamins. ', 'I need the minerals. ', 'I need and the probiotics to support my microbiome. ', "If you'd like to try out, let it greens, you can go. ", 'Two athletic greens.com huberman and claim a special offer. ', "They'll give you five free travel packs, which make it easy to mix up athletic greens while you're on the road, plus a year supply of vitamin D3 K to our ton of data. ", "Now showing that vitamin D3 is essential for various aspects of our brain and body Health, even if we're getting a lot of sunshine, many of us are still deficient in vitamin D3. ", 'And K2 is also important because it regulates things like cardiovascular function calcium in the body. ', 'And so on again, go to athletic greens.com hubermann. ', "To claim the special offer of the five free travel packs and the year supply of vitamin D3 K to today's episode is also brought To Us by inside tracker inside trackers, a personalized nutrition platform. ", 'That analyzes data from your blood and DNA to help you better understand your body and help you reach your health goals. ', "I've long been a believer in getting regular blood work done. ", 'For the simple reason that many of the factors that impact your immediate and long-term Health can only be assessed with a quality blood test. ', "What's unique about inside tracker. ", 'Is that while there are a lot of different Tests out there for hormones and metabolic factors Etc with inside tracker. ', 'You get the numbers back in terms of your levels, but they also give you very clear. ', 'Directives. ', "In terms of Lifestyle nutrition and supplementation that can help you bring those values into the ranges that are best for you and your health goals and that's very different than a lot of the other programs where you get a lot of information. ", "We don't really know what to do with that information inside tracker makes that all very easy to understand and very actionable based on the very easy-to-use dashboard at inside tracker. ", "If you'd like to try inside track, You can visit inside tracker.com huberman to get 20% off any of inside. ", 'Trackers plans. ', 'Just use the code huberman at checkout. ', "Okay, let's talk about sugar. ", "Let's talk about how sugar impacts your brain and how your brain impacts your pursuit or your avoidance of sugar. ", "Let's get a few things out of the way. ", 'First. ', "The first thing is that there's nothing inherently bad about sugar. ", "I know the word sugar gets a bad rap nowadays and indeed, you're going to hear over and over again. ", 'During this podcast that consuming a lot of refined sugars in particular, high fructose, corn syrup is known to have a very large number of bad effects on the brain and body. ',"I don't know that there's any one that really debates that anymore, even if we just agree and I think we should all agree on the so-called calories in calories out principle, right? ", "It's a principle of thermodynamics that if we ingest more energy than we burn, we are going to gain weight. ", 'If we ingest less energy than we burn, we are generally going to lose weight and if The two things are imbalance ingestion and burning of energy. ', "Well, then we're going to maintain weight. ", 'So everyone agrees on that. ', 'I agree on that. ', "But beyond that there are a number of ways in which particular nutrients in the case of today's episode sugar impact. ", 'The way that the brain works such that we tend to seek out more of particular nutrients. ', 'For instance. ', 'If we eat sugar, there are two or at least two mechanisms by which we will crave more sugar. ', 'I think. ', 'People are aware of that experience. ', "But today I'm going to explain exactly how that works. ", 'But also that when we ingest sugar, it has a bunch of different effects on the way that our neural circuits work that can allow us to be more or less focused more or less agitated more or less happy more or less depressed in some cases. ', 'So today as we explore this thing. ', "We're calling Sugar, We're Going to explore that mainly in the context of the nervous system, but also in the context of how the nervous system regulates many many. ", 'Functions and behaviors that are important to all of you, your ability to think your ability, to exercise, your ability to gain weight lose weight. ', 'Whatever your goals might happen, to be sugar, plays a critical role in achieving those goals. ', "And in some cases, if you're ingesting too much at the wrong times of the wrong forms, sugar can actually impede those goals. ", 'In fact, sugar can prevent all the right behaviors from allowing you to achieve the goals that you want. ', 'So today we are going to place sugar into its proper context. ', 'The way I want to start. ', 'Start off by doing that is to tell you a little bit of what happens when we eat and a little bit of what the brain does to respond to those events. ', 'So what happens when we eat? ', "Well, I've done an entire episode on metabolism. ", "So if you're interested in the full Cascade of hormonal and neural events that occurs when we eat, please check out that episode. ", "But for sake of today's discussion, let's just take a what I call Top Contour view of the hormonal response to ingesting food. ", 'Now, any time we eat, that is the consequence of a number of things that happen. ', "Before we ate, there's a hormone in our brain and body called ghrelin. ", 'Spelled GH reli. ', "N ghrelin is a hormone that increases depending on how long it's been since we ate last. ", 'Okay. ', "So the longer it's been since we had a meal ghrelin levels are going to be higher and higher and higher and it essentially makes us hungry by interacting with particular neurons in an area of the brain called the arcuate nucleus of the hypothalamus.- ", 'and some other areas as well, like the lateral hypothalamus. ', "You don't need to know the names of those brain areas. ", "But if you'd like to know them, there they are ghrelin increases. ", 'It tends to make us hungry. ', 'And then when we eat typically, what happens is ghrelin levels, go down. ', "So it's a very logical system. ", 'Now when we eat assuming that we eat carbohydrates, but even if we just eat some protein and some fats, we will experience a slight or in some cases, a large rise in blood. ','Glucose blood. ', "Glucose is simply blood sugar and the body and brain, we should say particular the nervous system doesn't function. ", 'Well, if blood sugar is too high or too low. ', 'So as a consequence we have another hormone, which is released from the pancreas, which is called insulin which helps regulate the amount of glucose in the bloodstream. ', 'So even if you were to ingest, an entire cup and 8 ounce cup of pure table sugar, which would send your blood glucose very very high assuming that you have a normal. ',"Insulin response that you're not diabetic that insulin response would help clamp that blood glucose level so that it did not cause damage to your brain and body. ", "Because if blood sugar goes too high, it's actually toxic to neurons and other cells of your body can kill them off and neurons of the central nervous system in the brain and spinal cord. ", 'Once they are dead. ', 'They do not come back. ', 'So your biological systems understand this at a biological level that is and prevent that death of cells due to high blood sugar by keeping. ', 'Insulin around in order to clamp blood glucose diabetics. ', "We call them type 1 diabetics who don't make insulin have to take insulin when they eat in particular, when they foods that raise their blood sugar specifically to avoid that neurotoxicity in the other deleterious effects of high blood sugar. ", 'Okay. ', 'So ghrelin is a hormone that goes up the longer. ', "It's been since we've eaten it tends to stimulate hunger when we eat ghrelin is suppressed blood glucose typically goes up, especially when we a carbohydrate containing meal. ", 'When blood glucose goes up, its regulated in the body meaning its Peaks and its valleys are more less smoothed out and that glucose is sequestered. ', "It's taken away where it needs to be taken away. ", "And in certain locations, it's delivered to cells so that those cells can use the glucose. ", 'Now, one of the chief organs for glucose utilization is the brain neurons are tremendously metabolically active and their preferred mode of metabolism is glucose metabolism. ', "The words neurons basically run on sugar, which is not to say that you should eat a lot of sugar as you'll see today. ", 'There are states of mind and body. ', 'For instance, fasted States in which people report having immense amounts of mental Clarity and their blood glucose is actually quite low. ', 'So it is simply not the case that the more sugar that you ingest the better that your brain will function. ', 'But it is the case that for most people, meaning people who are not on a ketogenic or very low carbohydrate diet. ', "They're not adapted to low carbohydrate. ", 'Diets that neurons in their brain and body are using glucose in order to function. ', "That's what allows those neurons to fire electrical, potentials. ", "That's how we refer to it. ", 'Firing meaning, sending electrical signals down their lengths to communicate with other neurons. ']   


In [12]:
encodings_podcast = [0]*500
for i in range(len(podcast_test)):
  encodings_podcast[i] = tokenizer(podcast_test[i], return_tensors="pt")
  print(encodings_podcast[i])

{'input_ids': tensor([[17250,  2460,    13,   220]]), 'attention_mask': tensor([[1, 1, 1, 1]])}
{'input_ids': tensor([[25082,  3583,   393,  3635,   611,   345,   821,  4964,   625,   319,
          7444,    13,   220]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
{'input_ids': tensor([[10814,    11,   703,   389,   345,  1804,    30,   220]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}
{'input_ids': tensor([[  40, 2911,  345,  821, 1719,  257, 7932, 1110,  523, 1290,   13,  220]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
{'input_ids': tensor([[ 3666,  1438,   318, 20330,  1055,  8836,   272,   290,   428,   318,
           262,  3223,  7443, 16036,    13,   220]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
{'input_ids': tensor([[  54, 6391,   11,  314,  761,  257, 7505, 3496,  329, 3223, 2106,   13,
          220]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
{'

In [14]:
import torch
from tqdm import tqdm

max_length = model.config.n_positions
#stride = 512
nlls = []
ppls = []
for j in range(len(podcast_test)):
  begin_loc = 0
  end_loc = encodings_podcast[j].input_ids.size(1)
  trg_len = end_loc  # may be different from stride on last loop
  input_ids = encodings_podcast[j].input_ids[:, begin_loc:end_loc].to(device)
  target_ids = input_ids.clone()
  with torch.no_grad(): # with torch.no_grad()中的數據不需要計算梯度，也不會進行反向傳播
    outputs = model(input_ids, labels=target_ids)
    neg_log_likelihood = outputs[0] * trg_len
  nlls.append(neg_log_likelihood)
  ppl = torch.exp(neg_log_likelihood / end_loc)
  ppls.append(ppl)


#ppl = torch.exp(torch.stack(nlls).sum() / end_loc)

In [15]:
print(nlls)
print(ppls)



[tensor(21.5929, device='cuda:0'), tensor(56.1166, device='cuda:0'), tensor(21.2079, device='cuda:0'), tensor(32.8869, device='cuda:0'), tensor(89.5587, device='cuda:0'), tensor(62.4577, device='cuda:0'), tensor(137.3782, device='cuda:0'), tensor(34.4666, device='cuda:0'), tensor(26.7866, device='cuda:0'), tensor(22.0678, device='cuda:0'), tensor(20.1689, device='cuda:0'), tensor(28.5108, device='cuda:0'), tensor(100.9797, device='cuda:0'), tensor(80.2885, device='cuda:0'), tensor(17.4487, device='cuda:0'), tensor(34.2462, device='cuda:0'), tensor(25.5189, device='cuda:0'), tensor(29.1281, device='cuda:0'), tensor(39.4148, device='cuda:0'), tensor(24.5080, device='cuda:0'), tensor(23.0595, device='cuda:0'), tensor(45.0938, device='cuda:0'), tensor(45.8344, device='cuda:0'), tensor(58.4558, device='cuda:0'), tensor(21.2177, device='cuda:0'), tensor(131.8104, device='cuda:0'), tensor(24.9970, device='cuda:0'), tensor(74.3297, device='cuda:0'), tensor(79.9861, device='cuda:0'), tensor(103

In [16]:
import torch
from tqdm import tqdm

max_length = model.config.n_positions
#stride = 512
nlls2 = []
ppls2 = []
for j in range(len(origin)):
  begin_loc = 0
  end_loc = encodings_podcast2[j].input_ids.size(1)
  trg_len = end_loc   # may be different from stride on last loop
  input_ids = encodings_podcast2[j].input_ids[:, begin_loc:end_loc].to(device)
  target_ids = input_ids.clone()
  with torch.no_grad(): # with torch.no_grad()中的數據不需要計算梯度，也不會進行反向傳播
    outputs = model(input_ids, labels=target_ids)
    neg_log_likelihood = outputs[0] * trg_len
  nlls2.append(neg_log_likelihood)
  ppl2 = torch.exp(neg_log_likelihood / end_loc)
  ppls2.append(ppl2)


#ppl = torch.exp(torch.stack(nlls).sum() / end_loc)

In [17]:
print(nlls2)
print(ppls2)



[tensor(95.7564, device='cuda:0'), tensor(96.4072, device='cuda:0'), tensor(17.3117, device='cuda:0'), tensor(102.5434, device='cuda:0'), tensor(131.3277, device='cuda:0'), tensor(106.2856, device='cuda:0'), tensor(39.0225, device='cuda:0'), tensor(25.8918, device='cuda:0'), tensor(98.8725, device='cuda:0'), tensor(59.0267, device='cuda:0'), tensor(99.6793, device='cuda:0'), tensor(21.6164, device='cuda:0'), tensor(62.0918, device='cuda:0'), tensor(32.0651, device='cuda:0'), tensor(35.8058, device='cuda:0'), tensor(96.4974, device='cuda:0'), tensor(166.1389, device='cuda:0'), tensor(64.6977, device='cuda:0'), tensor(37.4944, device='cuda:0'), tensor(236.5039, device='cuda:0'), tensor(97.9378, device='cuda:0'), tensor(62.5804, device='cuda:0'), tensor(294.1988, device='cuda:0'), tensor(59.4697, device='cuda:0'), tensor(87.0209, device='cuda:0'), tensor(54.8696, device='cuda:0'), tensor(151.6918, device='cuda:0'), tensor(159.9405, device='cuda:0'), tensor(69.5320, device='cuda:0'), tenso

With 🤗 Transformers, we can simply pass the `input_ids` as the `labels` to our model, and the average negative
log-likelihood for each token is returned as the loss. With our sliding window approach, however, there is overlap in
the tokens we pass to the model at each iteration. We don't want the log-likelihood for the tokens we're just treating
as context to be included in our loss, so we can set these targets to `-100` so that they are ignored. The following
is an example of how we could do this with a stride of `512`. This means that the model will have at least 512 tokens
for context when calculating the conditional likelihood of any one token (provided there are 512 preceding tokens
available to condition on).

Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window
strategy we discussed above. The smaller the stride, the more context the model will have in making each prediction,
and the better the reported perplexity will typically be.

When we run the above with `stride = 1024`, i.e. no overlap, the resulting PPL is `19.64`, which is about the same
as the `19.93` reported in the GPT-2 paper. By using `stride = 512` and thereby employing our striding window
strategy, this jumps down to `16.53`. This is not only a more favorable score, but is calculated in a way that is
closer to the true autoregressive decomposition of a sequence likelihood.