
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>


# LLMs with Hugging Face

**Choosing a pre-trained LLM**: In the demo notebook, you saw how to apply pre-trained models to many applications.  You will do that hands-on in this lab, with your main activity being to find a good model for each task.  Use the tips from the lecture and demo to find good models, and don't hesitate to try a few different possibilities.

**Understanding LLM pipeline configurations**: At the end of this lab, you will also do a more open-ended exploration of model and tokenizer configurations.

### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Practice finding an existing model for tasks you want to solve with LLMs.
1. Understand the basics of model and tokenizer options for tweaking model outputs and performance.


## Classroom Setup

In [0]:
%run ../Includes/Classroom-Setup

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


Resetting the learning environment:
| Enumerating serving endpoints...found 5...(0 seconds)
| No action taken

Skipping download of existing archive to "dbfs:/mnt/dbacademy-datasets/large-language-models/v03" 
| Validating local assets:
| | Listing local files...(0 seconds)
| | Validation completed...(0 seconds total)
|
| Skipping the unpacking of datasets to "dbfs:/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/datasets" 
|
| Dataset installation completed (0 seconds)



Importing lab testing framework.



Using the "default" schema.

Predefined paths variables:
| DA.paths.working_dir: /dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/working
| DA.paths.user_db:     /dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/working/database.db
| DA.paths.datasets:    /dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/datasets

Setup completed (10 seconds)

The models developed or used in this course are for demonstration and learning purposes only.
Models may occasionally output offensive, inaccurate, biased information, or harmful instructions.


## Find good models for your tasks

In each subsection below, you will solve a given task with an LLM of your choosing.  These tasks vary from straightforward to open-ended:
* **Summarization**: There are many summarization models out there, and many are simply plug-and-play.
* **Translation**: This task can require more work since models support varying numbers of languages, and in different ways.  Make sure you invoke your chosen model with the right parameters.
* **Few-shot learning**: This task is very open-ended, where you hope to demonstrate your goals to the LLM with just a few examples.  Choosing those examples and phrasing your task correctly can be more art than science.

Recall these tips from the lecture and demo:
* Use the [Hugging Face Hub](https://huggingface.co/models).
* Filter by task, license, language, etc. as needed.
* If you have limited compute resources, check model sizes to keep execution times lower.
* Search for existing examples as well.  It can be helpful to see exactly how models should be loaded and used.

In [0]:
from datasets import load_dataset
from transformers import pipeline

In [0]:
print("Setup complete!")

Setup complete!


### Question 1: Summarization

In this section, you will find a model from the Hugging Face Hub for a new summarization problem. **Do not use a T5 model**; find and use a model different from the one we used in the demo notebook.

We will use the same [xsum](https://huggingface.co/datasets/xsum) dataset.

In [0]:
xsum_dataset = load_dataset(
    "xsum", version="1.2.0", cache_dir=DA.paths.datasets
)  # Note: We specify cache_dir to use predownloaded data.
xsum_sample = xsum_dataset["train"].select(range(10))
display(xsum_sample.to_pandas())



Downloading builder script:   0%|          | 0.00/5.76k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.24k [00:00<?, ?B/s]

Found cached dataset xsum (/dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

document,summary,id
"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.",Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,35232142
"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately.""",Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,40143035
"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details",Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,35951548
"John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues.","A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.",36266422
"Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead.","An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",38826984
"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.",Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,34540833
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable.",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,20836172
"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica.",Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,35932467
"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title.""",Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,40758845
"The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC"".","A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".",30358490


Similarly to how we found and applied a model for summarization previously, fill in the missing parts below to create a pipeline using an existing LLM---but with a different model.  Then apply the pipeline to the sample batch of articles.

In [0]:
# Pick a summarization model
summarizer = pipeline("summarization", model="luisotorres/bart-finetuned-samsum", min_length=20, max_length=1024, truncation=True)

Downloading config.json:   0%|          | 0.00/1.59k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/274 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

In [0]:
# Apply results to one article
summarizer(xsum_sample["document"][1])

Your max_length is set to 1024, but your input_length is only 186. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=93)


[{'summary_text': 'Two tour buses were destroyed in a deliberate fire at a hotel in Belfast city centre.'}]

In [0]:
# Constructor a summarization pipeline
summarizer = pipeline("summarization", model="luisotorres/bart-finetuned-samsum", min_length=20, max_length=1024, truncation=True)

# Apply the pipeline to the batch of articles in `xsum_sample`
summarization_results = summarizer(xsum_sample["document"])
summarization_results

Your max_length is set to 1024, but your input_length is only 516. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=258)
Your max_length is set to 1024, but your input_length is only 186. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=93)
Your max_length is set to 1024, but your input_length is only 329. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=164)
Your max_length is set to 1024, but your input_length is only 220. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_lengt

[{'summary_text': 'The damage caused by flooding in Dumfries and Galloway and the Borders is still being assessed.'},
 {'summary_text': 'Two tour buses were destroyed in a deliberate fire at a hotel in Belfast city centre.'},
 {'summary_text': 'Mercedes will start the Bahrain Grand Prix from pole position. Stoffel Vandoorne out-qualified Jenson Button on his Formula 1 debut.'},
 {'summary_text': 'Former scout leader John Edward Bates is on trial accused of sexually abusing two boys.'},
 {'summary_text': 'A man with mental health issues threatened to shoot himself and others at a hospital in Istanbul. He had been receiving psychiatric treatment for the past two years.'},
 {'summary_text': 'Glasgow Warriors won the match against the Dragons. Simone Favaro scored a last-gasp try with the last move of the game.'},
 {'summary_text': 'Veronica Chango-Alverez, 31, was killed in a car crash in Streatham High Road. She was on her way to work in a hotel. '},
 {'summary_text': 'Demoitie died afte

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion1_1(summarizer, summarization_results, xsum_sample["document"])

[32mPASSED[0m: All tests passed for lesson1, question1
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


In [0]:
# Display the generated summary side-by-side with the reference summary and original document.
import pandas as pd

display(
    pd.DataFrame.from_dict(summarization_results)
    .rename({"summary_text": "generated_summary"}, axis=1)
    .join(pd.DataFrame.from_dict(xsum_sample))[
        ["generated_summary", "summary", "document"]
    ]
)

generated_summary,summary,document
The damage caused by flooding in Dumfries and Galloway and the Borders is still being assessed.,Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk."
Two tour buses were destroyed in a deliberate fire at a hotel in Belfast city centre.,Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately."""
Mercedes will start the Bahrain Grand Prix from pole position. Stoffel Vandoorne out-qualified Jenson Button on his Formula 1 debut.,Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details"
Former scout leader John Edward Bates is on trial accused of sexually abusing two boys.,"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues."
A man with mental health issues threatened to shoot himself and others at a hospital in Istanbul. He had been receiving psychiatric treatment for the past two years.,"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.","Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead."
Glasgow Warriors won the match against the Dragons. Simone Favaro scored a last-gasp try with the last move of the game.,Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott."
"Veronica Chango-Alverez, 31, was killed in a car crash in Streatham High Road. She was on her way to work in a hotel.",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable."
Demoitie died after a collision with a motorbike during the Gent-Wevelgem race. The UCI will co-operate with all relevant authorities in the investigation.,Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica."
Manchester City midfielder Ilkay Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August.,Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title."""
A man has been seriously injured in a crash on the A127. He was airlifted to the Royal London Hospital for further treatment.,"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC""."


### Question 2: Translation

In this section, you will find a model from the Hugging Face Hub for a new translation problem.

We will use the [Helsinki-NLP/tatoeba_mt](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt) dataset.  It includes sentence pairs from many languages, but we will focus on translating Japanese to English.

Hints in case you feel stuck on this task:
* Some models can handle *a lot* of languages.  Check out [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb), the No Language Left Behind model ([research paper](https://arxiv.org/abs/2207.04672)).
* The "translation" task for `pipeline` takes optional parameters `src_lang` (source language) and `tgt_lang` (target language), which are important when the model can handle multiple languages.  To figure out what codes to use to specify languages (and scripts for those languages), it can be helpful to find existing examples of using your model; for NLLB, check out [this Python script with codes](https://huggingface.co/spaces/Geonmo/nllb-translation-demo/blob/main/flores200_codes.py) or similar demo resources.


In [0]:
jpn_dataset = load_dataset(
    "Helsinki-NLP/tatoeba_mt",
    "eng-jpn_Hani",
    cache_dir=DA.paths.datasets,
)
jpn_sample = (
    jpn_dataset["test"]
    .select(range(10))
    .rename_column("sourceString", "English")
    .rename_column("targetString", "Japanese")
    .remove_columns(["sourceLang", "targetlang"])
)
display(jpn_sample.to_pandas())



Downloading builder script:   0%|          | 0.00/15.5k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/12.1k [00:00<?, ?B/s]

Found cached dataset tatoeba_mt (/dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/datasets/Helsinki-NLP___tatoeba_mt/eng-jpn_Hani/0.0.0/01e819f3f64a772a2ca70949061d295d3a2dc99d05183fe4776a3be23f75f619)


  0%|          | 0/2 [00:00<?, ?it/s]

English,Japanese
Absolutely!,絶対！
A campaign is underway throughout the company to achieve economy in the use of copying paper.,コピー紙の節約運動が全社的に展開されている。
Accrued interest will be paid into your account.,生じた利息は貯金口座に入金されます。
A child is very sensitive to its mother's love.,子供は母親の愛情にとても敏感だ。
Acid rain isn't a natural phenomenon.,酸性雨は自然現象ではない。
A female friend of ours took a trip to a small village last week.,我々の女の友達は先週小さな町へ旅行しました。
Affection sprang up between them.,二人の間に愛情が芽生えた。
"After consideration, the company president made a large scale change to the management strategy.",社長は逡巡した後に、大規模な経営戦略の転換を図った。
"After I had done my homework, I went to bed.",宿題を終えた後で私は寝た。
"After supper, I washed the dishes.",夕食後、私は皿を洗った。



Similarly to how we previously found and applied a model for translation among other languages, you must now find a model to translate from Japanese to English.  Fill in the missing parts below to create a pipeline using an existing LLM.  Then apply the pipeline to the sample batch of Japanese sentences.

In [0]:
# Use Facebook Mbart 50 Many to one for translation
translator_ja_en = pipeline("translation", model="Helsinki-NLP/opus-mt-ja-en")

Downloading config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/303M [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading source.spm:   0%|          | 0.00/782k [00:00<?, ?B/s]

Downloading target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/1.50M [00:00<?, ?B/s]



In [0]:
jpn_sample["Japanese"][0]

'絶対！'

In [0]:
translator_ja_en(jpn_sample["Japanese"][0])

[{'translation_text': 'Never!'}]

In [0]:
# Construct a pipeline for translating Japanese to English.
translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-ja-en")

# Apply your pipeline on the sample of Japanese text in: jpn_sample["Japanese"]
translation_results = translation_pipeline(jpn_sample["Japanese"])

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion1_2(translation_pipeline, translation_results, jpn_sample["Japanese"])

[32mPASSED[0m: All tests passed for lesson1, question2
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


In [0]:
# Now we can display your translations side-by-side with the ground-truth `English` column from the dataset.
translation_results_df = pd.DataFrame.from_dict(translation_results).join(
    jpn_sample.to_pandas()
)
display(translation_results_df)

translation_text,English,Japanese
Never!,Absolutely!,絶対！
The saving of copies of paper has been under way throughout the company.,A campaign is underway throughout the company to achieve economy in the use of copying paper.,コピー紙の節約運動が全社的に展開されている。
The interest that happened will be cashed into the savings account.,Accrued interest will be paid into your account.,生じた利息は貯金口座に入金されます。
Children are very sensitive to mother's affection.,A child is very sensitive to its mother's love.,子供は母親の愛情にとても敏感だ。
acid rain isn't a natural phenomenon.,Acid rain isn't a natural phenomenon.,酸性雨は自然現象ではない。
Our girl friend traveled to a small town last week.,A female friend of ours took a trip to a small village last week.,我々の女の友達は先週小さな町へ旅行しました。
A love developed between them.,Affection sprang up between them.,二人の間に愛情が芽生えた。
The boss tried to change his large-scale management strategy after he had gone through with it.,"After consideration, the company president made a large scale change to the management strategy.",社長は逡巡した後に、大規模な経営戦略の転換を図った。
"After I finished my homework, I went to bed.","After I had done my homework, I went to bed.",宿題を終えた後で私は寝た。
"After dinner, I washed the dishes.","After supper, I washed the dishes.",夕食後、私は皿を洗った。


### Question 3: Few-shot learning

In this section, you will build a prompt which gets an LLM to answer a few-shot learning problem.  Your prompt will have 3 sections:

1. High-level instruction about the task
1. Examples of query-answer pairs for the LLM to learn from
1. New query

Your goal is to make the LLM answer the new query, with as good a response as possible.

More specifically, your prompt should follow this template:
```
<High-level instruction about the task: Given input_label, generate output_label.>:

[<input_label>]: "<input text>"
[<output_label>]: "<output_text>"
###
[<input_label>]: "<input text>"
[<output_label>]: "<output_text>"
###
[<input_label>]: "<input text>"
[<output_label>]:
```
where the final two lines represent the new query.

It is up to you to choose a task, but here are some ideas:
* Translation: This is easy but less interesting since there are already models fine-tuned for translation.  You can generate examples via a tool like Google Translate.
* Create book titles or descriptions: Given a book title, generate a description, or vice versa.  You can get examples off of Wikipedia.
* Generate tweets: Given keywords or a key message, generate a tweet.
* Identify the subject: Given a sentence, extract the noun or name of the subject of the sentence.

*Please **do not** copy examples from the demo notebook.*

Tips:
* If the model gives bad outputs with only 1 or 2 examples, try adding more.  3 or 4 examples can be much better than 1 or 2.
* Not all tasks are equally difficult.  If your task is too challenging, try a different one.

In [0]:
few_shot_pipeline = pipeline(
    task="text-generation",
    model="EleutherAI/gpt-neo-1.3B",
    max_new_tokens=50,
    model_kwargs={"cache_dir": DA.paths.datasets},
)  # Use a predownloaded model

# Get the token ID for "###", which we will use as the EOS token below.  (Recall we did this in the demo notebook.)
eos_token_id = few_shot_pipeline.tokenizer.encode("###")[0]

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.



Fill in the template below.  Feel free to adjust the number of examples.

In [0]:
# Fill in this template.

prompt =\
"""Given a product, generate an catchy advertising slogan for it:

[product]: "BMW"
[slogan]: "Always driven, with BMW"
###
[product]: "Singapore Airlines"
[slogan]: "Get away, with Singapore Airlines"
###
[product]: "iPhone"
[slogan]: "Your phone, your world, with iPhone"
###
[product]: "LG OLED TV"
[slogan]: "See Perfection, with LG OLED TV"
###
[product]: "Rolex"
[slogan]: "Be Timeless, with Rolex"
###
[product]: "IKEA"
[slogan]: """

In [0]:
results = few_shot_pipeline(prompt, do_sample=True, eos_token_id=eos_token_id)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


Given a product, generate an catchy advertising slogan for it:

[product]: "BMW"
[slogan]: "Always driven, with BMW"
###
[product]: "Singapore Airlines"
[slogan]: "Get away, with Singapore Airlines"
###
[product]: "iPhone"
[slogan]: "Your phone, your world, with iPhone"
###
[product]: "LG OLED TV"
[slogan]: "See Perfection, with LG OLED TV"
###
[product]: "Rolex"
[slogan]: "Be Timeless, with Rolex"
###
[product]: "IKEA"
[slogan]:  "Home of the Family"
###


In [0]:

# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion1_3(few_shot_pipeline, prompt, results[0]["generated_text"])

[32mPASSED[0m: All tests passed for lesson1, question3
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


## Explore model and tokenizer settings

So far, we have used pipelines in a very basic way, without worrying about configuration options.  In this section, you will explore the various options for models and tokenizers to learn how they affect LLM behavior.

We will load a dataset, tokenizer, and model for you.  We will also define a helper method for printing out results nicely.

In [0]:
# Load data, tokenizer, and model.

from transformers import T5Tokenizer, T5ForConditionalGeneration

xsum_dataset = load_dataset("xsum", version="1.2.0", cache_dir=DA.paths.datasets)
xsum_sample = xsum_dataset["train"].select(range(10))

tokenizer = T5Tokenizer.from_pretrained("t5-small", cache_dir=DA.paths.datasets)
model = T5ForConditionalGeneration.from_pretrained(
    "t5-small", cache_dir=DA.paths.datasets
)

# Prepare articles for T5, which requires a "summarize: " prefix.
articles = list(map(lambda article: "summarize: " + article, xsum_sample["document"]))

Found cached dataset xsum (/dbfs/mnt/dbacademy-users/labuser5455921@vocareum.com/large-language-models/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

In [0]:
def display_summaries(decoded_summaries: list) -> None:
    """Helper method to display ground-truth and generated summaries side-by-side"""
    results_df = pd.DataFrame(zip(xsum_sample["summary"], decoded_summaries))
    results_df.columns = ["Summary", "Generated"]
    display(results_df)

### Open-ended exploration

In the cells below, we provide code for running the tokenizer and model on the articles.  Your task is to play around with the various configurations to gain more intuition about the effects.  Look for changes to output quality and running time in particular, and remember that running the same code twice may result in different answers.

Below, we list brief descriptions of each of the parameters you may wish to tweak.
* Tokenizer encoding
  * `max_length`: This caps the maximum input length.  It must be at or below the model's input length limit.
  * `return_tensors`: Do not change this one.  This tells Hugging Face to return tensors in PyTorch ("pt") format.
* Model
  * `do_sample`: True or False.  This tells the model whether or not to use sampling in generation.  If False, then it will do greedy search or beam search.  If True, then it will do random sampling which can optionally use the top-p and/or top-k sampling techniques.  See the blog post linked below for more details on sampling techniques.
  * `num_beams`: (for beam search) This specifies the number of beams to use in beam search across possible sequences.  Increasing the number can help the model to find better sequences, at the cost of more computation.
  * `min_length`, `max_length`: Generative models can be instructed to generate new text between these token lengths.
  * `top_k`: (for sampling) This controls the use of top-K sampling, which forces sampling to ignore low-probability tokens by limiting to the K most probable next tokens.  Set to 0 to disable top-K sampling.
  * `top_p`: (for sampling) This controls the use of top-p sampling, which forces sampling to ignore low-probability tokens by limiting to the top tokens making up probability mass p.  Set to 0 to disable top-p sampling.
  * `temperature`: (for sampling) This controls the "temperature" of the softmax.  Lower values bias further towards high-probability next tokens.  Setting to 0 makes sampling equivalent to greedy search.
* Tokenizer decoding
  * `skip_special_tokens`: True or False.  This allows you to skip special tokens (like EOS tokens) in the model outputs.

Do not tweak:
* Tokenizer encoding
  * `padding`: True or False.  This helps to handle variable-length inputs by adding padding to short inputs.  Since it should be set according to your task and data, you should not change it for this exercise (unless you want to see what warnings or error may appear).
  * `truncation`: True or False.  This helps to handle variable-length inputs by truncating very long inputs.  Since it should be set according to your task and data, you should not change it for this exercise (unless you want to see what warnings or error may appear).

If you need more info about the parameters of methods, see the `help()` calls in cells below, or search the Hugging Face docs.  Some top links are:
* Tokenizer call for encoding: [PreTrainedTokenizerBase.\_\_call\_\_ API docs](https://huggingface.co/docs/transformers/v4.28.1/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__)
* Model invocation: [Docs for generation strategies](https://huggingface.co/docs/transformers/main/en/generation_strategies) and this blog post on ["How to generate text: using different decoding methods for language generation with Transformers"](https://huggingface.co/blog/how-to-generate)

If you mess up and can't get back to a working state, you can use the Revision History to revert your changes.
Access that via the clock-like icon or "Revision History" button in the top-right of this notebook page. (See screenshot below.)

![Screenshot of notebook Revision History](https://files.training.databricks.com/images/llm/revision_history.png)

### Default

In [0]:
##############################################################################
# TODO: Try editing the parameters in this section, and see how they affect the results.
#       You can also copy and edit the cell to compare results across different parameter settings.
#
# We show all parameter settings for ease-of-modification, but in practice, you would only set relevant ones.
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)

summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    do_sample=True,
    num_beams=2,
    min_length=0,
    max_length=40,
    top_k=20,
    top_p=0.5,
    temperature=0.7,
)

decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
##############################################################################

display_summaries(decoded_summaries)

Summary,Generated
Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,the full cost of damage in Newton Stewart is still being assessed. many roads in peeblesshire remain badly affected by standing water. a flood alert remains in place across the
Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"fire alarm went off at the Holiday Inn in Hope Street on Saturday. guests were asked to leave the hotel. one of the two buses is from germany, the other from china"
Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,"stewards handed reprimand to f1 team-mate. f1 chief says he is ""very happy"" with qualifying. f1 chief says"
"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","the 67-year-old is accused of committing the offences between March 1972 and October 1989. he denies all the charges, including two counts of indecency"
"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",a man receiving treatment at the clinic threatened to shoot himself and others. he was evacuated from the hospital. the incident comes amid tension in Istanbul following several attacks. the
Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,the dragons gave a first start of the season to wing aled Brew and hooker Elliot Dee. it took 24 minutes for a disjo
A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, died in the crash in Streatham, croydon. the driver fled the scene abandoning the grey Audi, which"
Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,the 25-year-old was hit by a motorbike during the Gent-Wevelgem race. the race passed through northern France and involved 50 motorbikes. the
Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,arsenal striker says he will not rush his return to the premier league. the 26-year-old will not be fit for the start of the season at Brighton on 12 august
"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","the crash happened about 07:20 GMT at the junction of the A127 and Progress Road. the man, aged in his 20s, was treated at the scene for a head injury"


### Beam Search
---
Increased from 2 to 6

In [0]:
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)

summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    do_sample=True,
    num_beams=6,
    min_length=0,
    max_length=40,
    top_k=20,
    top_p=0.5,
    temperature=0.7,
)

decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
##############################################################################

display_summaries(decoded_summaries)

Summary,Generated
Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,the full cost of damage in Newton Stewart is still being assessed. many roads in peeblesshire remain badly affected by standing water. a flood alert remains in place across the
Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"fire alarm went off at the Holiday Inn in Hope Street on Saturday. guests were asked to leave the hotel. one of the two buses is from germany, the other from china"
Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,"stewards handed reprimand to f1 team-mate. f1 chief says he is ""very happy"" with qualifying. f1 chief says"
"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","the 67-year-old is accused of committing the offences between March 1972 and October 1989. he denies all the charges, including two counts of indecency"
"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",a man receiving treatment at the clinic threatened to shoot himself and others. he was evacuated from the hospital. the incident comes amid tension in Istanbul following several attacks. the
Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,the dragons gave a first start of the season to wing aled Brew and hooker Elliot Dee. it took 24 minutes for a disjo
A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, died in the crash in Streatham, croydon. the driver fled the scene abandoning the grey Audi, which"
Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,the 25-year-old was hit by a motorbike during the Gent-Wevelgem race. the race passed through northern France and involved 50 motorbikes. the
Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,arsenal striker says he will not rush his return to the premier league. the 26-year-old will not be fit for the start of the season at Brighton on 12 august
"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","the crash happened about 07:20 GMT at the junction of the A127 and Progress Road. the man, aged in his 20s, was treated at the scene for a head injury"


### Temperature
---
Lowered from 0.7 to 0.2

In [0]:
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)

summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    do_sample=True,
    num_beams=2,
    min_length=0,
    max_length=40,
    top_k=20,
    top_p=0.5,
    temperature=0.2,
)

decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
##############################################################################

display_summaries(decoded_summaries)

Summary,Generated
Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,the full cost of damage in Newton Stewart is still being assessedwel Alabama résultats demonstrate elementsggy Israel Chairmanaimed pendant2). Spring prideemploi erklärt violateganzeții Arbeits450 activitate Mattatteétantlit subjects dairy
Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,fire alarm went off at the Holiday Inn in Hope street conserve] première within agree Tap sunset Gulfenie Jam operazăspire script surgical Tri (9Hu world happens contenu include Set residence rapidly with livres
Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,stewards handed reprimand to Ferrari after campaignsEF présence bank accurateglass courtsbay Housefungprozess N contact overseas efforts invit trainer poti (4IVspect meeting celle soccerfully Tahuman
"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.",the 67-year-old is accused of committing licenceo Soft slide foi team heateddiesquo insulation DCprä strip consulting Got efficiently ladies acestui landscape dollars câte customer rally joined elect Gulf texte
"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",a man receiving treatment at the clinic threatened to shoot himself capture shout Institute Chineseoù shot evaluation défi multipli Design wie emb României get starter Leatherpaired keys réalisiunea regime messlichellen detectedful Komm
Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,"the dragons gave a 1-0 lead after just 72 frameschinа Erfolg proposer soda Justin trackville chap wrappednight desk aero experiment popul onceroy Coloring2,000 lower still acts Premier realise decentjur"
A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, was lodgecredit bune plea owners steering Language Gefühl you character trackkeeper GartenDr recipient Communications suspend Cu Da120 ar memories candidatespixelberry debris"
Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,the 25-year-old was hit by a motor tireiereilykyboywhiteSchw assault rocksyer JuanA school dairyKA potatoes stressedPORT Vis Perfect atât Heart expanding step afin Photo Adventure
Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,arsenal striker says he can see the finishing line following reign femaleMO yes Additional Partners spectacular also figures applyargPap unbe full obstacles compassion wird breaks recommendations Geb top agenciesrott dealt faciButtec
"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","the crash happened about 07:20 GMT at the junction of two Through purative NO Over monthly show Writing abvă filters architectural eigentlich messagesdigit 2015 cosmetic samples legal,” Spark Maxire talk sh developer"


### Top_k
---
Increased from 20 to 50

In [0]:
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)

summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    do_sample=True,
    num_beams=2,
    min_length=0,
    max_length=40,
    top_k=50,
    top_p=0.5,
    temperature=0.7,
)

decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
##############################################################################

display_summaries(decoded_summaries)

Summary,Generated
Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,the full cost of damage in Newton Stewart is still being assessed. many roads in peeblesshire remain badly affected by standing water. a flood alert remains in place across the
Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"fire alarm went off at the Holiday Inn in Hope Street on Saturday. guests were asked to leave the hotel. one of the two buses is from germany, the other from china"
Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,stewards handed reprimand to f1 team-mate. f1 chief says he is in good shape for tomorrow. stewards say
"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","the 67-year-old is accused of committing the offences between March 1972 and October 1989. he denies all the charges, including two counts of indecency"
"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",a man receiving treatment at the clinic threatened to shoot himself and others. he was evacuated from the hospital. the incident comes amid tension in Istanbul following several attacks. the
Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,the dragons gave a crucial try to aston villa's aled Brew and junior bulgumakau. Gregor Townsend gave a debut to power
A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, died in the crash in Streatham, croydon. the driver fled the scene abandoning a grey Audi,"
Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,the 25-year-old was hit by a motorbike during the Gent-Wevelgem race. the race passed through northern France and involved 50 motorbikes. the
Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,arsenal striker says he can see the finishing line after tearing cruciate knee ligaments. the 26-year-old will not be fit for the start of the premier league
"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","the crash happened about 07:20 GMT at the junction of the A127 and Progress Road. the man, aged in his 20s, was treated at the scene for a head injury"


### Top_p
---
Reduced from 0.5 to 0.1

In [0]:
inputs = tokenizer(
    articles, max_length=1024, return_tensors="pt", padding=True, truncation=True
)

summary_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    do_sample=True,
    num_beams=2,
    min_length=0,
    max_length=40,
    top_k=20,
    top_p=0.1,
    temperature=0.7,
)

decoded_summaries = tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
##############################################################################

display_summaries(decoded_summaries)

Summary,Generated
Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,the full cost of damage in Newton Stewart is still being assessed. many roads in peeblesshire remain badly affected by standing water. the water breached a retaining
Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,"fire alarm went off at the Holiday Inn in Hope Street on Saturday. guests were asked to leave the hotel. one of the two buses is from germany, the other from china"
Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,"stewards handed reprimand to f1 team-mate. f1 chief says he is ""very happy"" with qualifying. f1 chief says"
"A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.","the 67-year-old is accused of committing the offences between March 1972 and October 1989. he denies all the charges, including two counts of indecency"
"An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",a man receiving treatment at the clinic threatened to shoot himself and others. he was evacuated from the hospital. the incident comes amid tension in Istanbul following several attacks. the
Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,the dragons gave a 1-0 lead in the second period of the game. the result was a 63rd minute try from a rolling maul. the
A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,"Veronica Vanessa Chango-Alverez, 31, died in the crash in Streatham, croydon. the driver fled the scene abandoning the grey Audi, which"
Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,the 25-year-old was hit by a motorbike during the Gent-Wevelgem race. the race passed through northern France and involved 50 motorbikes. the
Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,arsenal striker says he can see the finishing line after tearing cruciate knee ligaments. the 26-year-old will not be fit for the start of the premier league
"A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".","the crash happened about 07:20 GMT at the junction of the A127 and Progress Road. the man, aged in his 20s, was treated at the scene for a head injury"


Uncomment `help()` calls below as needed to see doc strings for stages of the pipeline.

In [0]:
# Options for calling the tokenizer (lots to see here)
help(tokenizer.__call__)

Help on method __call__ in module transformers.tokenization_utils_base:

__call__(text: Union[str, List[str], List[List[str]]] = None, text_pair: Union[str, List[str], List[List[str]], NoneType] = None, text_target: Union[str, List[str], List[List[str]]] = None, text_pair_target: Union[str, List[str], List[List[str]], NoneType] = None, add_special_tokens: bool = True, padding: Union[bool, str, transformers.utils.generic.PaddingStrategy] = False, truncation: Union[bool, str, transformers.tokenization_utils_base.TruncationStrategy] = None, max_length: Optional[int] = None, stride: int = 0, is_split_into_words: bool = False, pad_to_multiple_of: Optional[int] = None, return_tensors: Union[str, transformers.utils.generic.TensorType, NoneType] = None, return_token_type_ids: Optional[bool] = None, return_attention_mask: Optional[bool] = None, return_overflowing_tokens: bool = False, return_special_tokens_mask: bool = False, return_offsets_mapping: bool = False, return_length: bool = False, ve

In [0]:
# Options for invoking the model (lots to see here)
help(model.generate)

Help on method generate in module transformers.generation.utils:

generate(inputs: Optional[torch.Tensor] = None, generation_config: Optional[transformers.generation.configuration_utils.GenerationConfig] = None, logits_processor: Optional[transformers.generation.logits_process.LogitsProcessorList] = None, stopping_criteria: Optional[transformers.generation.stopping_criteria.StoppingCriteriaList] = None, prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None, synced_gpus: Optional[bool] = None, assistant_model: Optional[ForwardRef('PreTrainedModel')] = None, streamer: Optional[ForwardRef('BaseStreamer')] = None, **kwargs) -> Union[transformers.generation.utils.GreedySearchEncoderDecoderOutput, transformers.generation.utils.GreedySearchDecoderOnlyOutput, transformers.generation.utils.SampleEncoderDecoderOutput, transformers.generation.utils.SampleDecoderOnlyOutput, transformers.generation.utils.BeamSearchEncoderDecoderOutput, transformers.generation.utils.Bea

In [0]:
# Options for calling the tokenizer for decoding (not much to see here)
help(tokenizer.batch_decode)

Help on method batch_decode in module transformers.tokenization_utils_base:

batch_decode(sequences: Union[List[int], List[List[int]], ForwardRef('np.ndarray'), ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor')], skip_special_tokens: bool = False, clean_up_tokenization_spaces: bool = None, **kwargs) -> List[str] method of transformers.models.t5.tokenization_t5.T5Tokenizer instance
    Convert a list of lists of token ids into a list of strings by calling decode.
    
    Args:
        sequences (`Union[List[int], List[List[int]], np.ndarray, torch.Tensor, tf.Tensor]`):
            List of tokenized input ids. Can be obtained using the `__call__` method.
        skip_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not to remove special tokens in the decoding.
        clean_up_tokenization_spaces (`bool`, *optional*):
            Whether or not to clean up the tokenization spaces. If `None`, will default to
            `self.clean_up_tokenization_spaces

## Submit your Results (edX Verified Only)

To get credit for this lab, click the submit button in the top right to report the results. If you run into any issues, click `Run` -> `Clear state and run all`, and make sure all tests have passed before re-submitting. If you accidentally deleted any tests, take a look at the notebook's version history to recover them or reload the notebooks.

&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>